Draft Blog Entries

% fortune -ae paul murphy

From Chapter one: Data Processing and the IBM Mainframe

This is the 3rd excerpt from the second book in the Defen series: T Business Information Technology: Foundations, Infrastructure, and Culture

Roots (part One)

Herman Hollerith, then a statistician with the US bureau of the Census formed a company called the Tabulating Machine Co. in 1896 to commercialize technology he had developed to help with the counting processes used in the 1890 US census.

That technology, derived in large part from the 1877 programmable Jacquard loom, consisted of two parts:

As shown opposite, a paper card, standardized forty years later by IBM's James Bryce as having 24 rows and 80 columns, into which holes could be punched at row-column intersections to represent values.
A mechanical tabulator which read values from a stack of input cards and did something with them. There were initially only two things it could do:
1. sort the cards according to values in the top row; or,
2. count cards and add the values in any row.
so to count people, the census process created a card giving the total for each enumeration area, sorted these by state or town, and then used the machine to get the total for each stack of cards; entered those totals on new cards, and totalled up state populations by processing those.

The Tabulating Machine Co was a business success and led to the formal incorporation of IBM's immediate predecessor, the Computing- Tabulating-Recording Company (C-T-R) in New York state on June 15th, 1911.

By the 1930s the technology had advanced to the point that users could lease an electric card punch from IBM that would automatically sort cards and then enter subtotals in each sort group onto new cards using both punched holes and typed text to represent the values.

Using this type of gear for accounting purposes required a number of well defined steps in a discipline known in the 1920s as "data processing."

By 1931 Data Processing was the subject of widespread formal training with 51 public high schools in one state offering curriculums built around their own class room equipment, hundreds of vendor operated schools, and a substantial state funded adult education effort with 27,500 graduates in 1930.

That was then, this is now - and nothing much has changed

The audit process for this type of work was to:

Physically search through card bundles to look for out of place cards;

Compare a statistical sample of cards to the original documents;

Manually add the values from one or more sample groups, and,

Perhaps contact a sample of those involved to determine the accuracy and probable completeness of the document files.

Now auditors sample the data files using packaged SQL, load the output into a spreadsheet, compare that record to the original documents; and, sometimes contact a sample of those involved to determine the accuracy and probable completeness of the document files.

Thus a late twenties IBM proposal to process several million railroad waybills per month included the following steps:

Transfer key information about sources, destinations, rates, and weights from each paper waybill to punched cards;
Sort cards mechanically to create separate card sets for each source-destination pair;
Sort cards within source-destination pairings by weight group and rate;
Sum revenues in each final grouping with the machine punching out new cards giving revenues by rate group for each source-destination pair; and,
Transcribe the punch card output to paper by typewriter.

Other companies, notably Burroughs, Honeywell, and NCR, built competitive equipment, but IBM dominated the high end of the business and so entered the computer era as the world's leading data processing company.

Initially no one at IBM saw, or at least acted as if they saw, the emergence of digital data processing as either a business threat or a business opportunity - and the inventors of the computer didn't see IBM as a player either. Thus when John Atanasoff in the US and Konrad Zuse in Germany independently developed the first working digital computers in the mid thirties, neither sought help from commercial interests and both targeted only military and research uses.

IBM was involved in code breaking and logistics management during the second world war, but mainly as a supplier of machines and expertise based on mechanical card processing. IBM's later role in the development of scientific processors evolved naturally from that foundation, but the break through linking the old accounting machines to the new didn't come from within IBM.

When IBM introduced the 603 and 604 series in 1949 they were offered for sale as "Electronic Calculating Punches" - machines designed to automate repeat calculations and produce output cards suitable for use with the older analog computing gear the company sold. The 604 was, however, a real digital computer capable of storing up to 60 program steps and able to apply all four basic arithmetic functions within program steps. As such it, like the smaller 603, could be programmed to act as a controller for a card sorter and engineers at Northrop Aircraft did just that when they physically connected a 604 to an earlier IBM card sorting machine intended for use in accounting applications.

The result was a new commercial discipline: automatic data processing [ADP]. In automatic data processing, many of the manual steps in the old electro-mechanical process, particularly batch sequencing and the steps involved in preparing the results of one batch as input to the next step in the series, could be done "automatically."

This use of a 604 had an immediate creative consequence: a proposal by two IBM staffers: Stephen Dunwell and Werner Buchholz, to build a "Datatron"; a machine which combined the best of both the new digital tools and the older electro-mechanical tabulators.

That proposal initially went nowhere, except that two core models of the next generation, the 702/705 series, were distinguished by having the wiring already in place for "the Northrop connection" and were offered to businesses for commercial use in automated data processing.

This series sold an unprecedented 5,500 units - thus setting off IBM's transition from a producer of analog computational equipment for business to one that was focussed on applying the new digital tools to the same problems.

Programming the earliest machines was extremely difficult and time consuming because it all had to be done at the level of binary code. By 1949, however, Grace Hopper, who had formerly worked with the MARK I, II, and III series at the Bureau of Ordnance Computation Project at Harvard University's Cruft Laboratory and was now working for the US Navy, had developed the first non-binary programming language.

Compilers

Machine instruction sets can be thought of as similar to keys on a piano. Hit, or execute, one and something distinctive happens. Since the IBM 360 used 8bit bytes, it could have up to 2*8= 255 instructions hardwired into the machine to trigger the matching instruction on receipt of the numeric code.
Thus the specific action to add the values found in registers (storage areas on the CPU) A and B and place the result in register C might be triggered if the machine receives instruction code 44 - just like the piano would sound middle C if that key were pressed.
An executable program therefore looks something like:
0001110 001100 000010 010111 111111 000100 000000 001100 000011
0001110 010111 111110 000100 000000 001100 000110 010111 111111
0001000 000100 000000 001100 000111 010111 111111 000100 000000
0001010 001100 000110 010111 111111 000100 000000 001100 000111

and consists of a mix of data and instructions for the CPU to work with.
In most systems these are referred to as binary files, because of the base two representation used. In the IBM mainframe world these are also known, for historical reasons, as object, image, or loadable code.
Macros started out as commonly used instruction sequences, literally packs of cards (known as load modules) that users could splice into jobs to avoid repeatedly re-entering the same binary sequences. Macros soon grew names - "Add_AB"- and people started to assemble these into real programs.
The first compilers read instructions consisting of these macro names inter-polated in a binary data or instruction stream and "compiled" the output stream by substituting in the binary expansions for the macros.
Assembler is considered a first generation language because its instructions consisted of macro names (although the term "macro" is now used for other purposes too) and it took only one step to generate the binary code.
COBOL is second generation because most COBOL compilers implicitly generate assembler as an intermediate output before generating the binary output- making COBOL two code generation steps removed from binary.

Known as AO, this language consisted of little more than a set of rules governing scripts which could then be mechanically processed to assemble predefined binary programs from card decks or paper tape in the right order to produce a ready-to-run job deck. That set the stage, however, for development of an immediate successor: BO; and then, in 1950, the first version of a language called then called Flowmatic.

---

Some notes:

These excerpts don't include footnotes and most illustrations have been dropped as simply too hard to insert correctly. (The wordpress html "editor" as used here enables a limited html subset and is implemented to force frustrations like the CPM line delimiters from MS-DOS).
The feedback I'm looking for is what you guys do best: call me on mistakes, add thoughts/corrections on stuff I've missed or gotten wrong, and generally help make the thing better.
Notice that getting the facts right is particularly important for BIT - and that the length of the thing plus the complexity of the terminology and ideas introduced suggest that any explanatory anecdotes anyone may want to contribute could be valuable.
When I make changes suggested in the comments, I make those changes only in the original, not in the excerpts reproduced here.

The Unix Guide to Defenestration.