% fortune -ae paul murphy

From Chapter one: Data Processing and the IBM Mainframe

This is the 4rd excerpt from the second book in the Defen series: BIT: Business Information Technology: Foundations, Infrastructure, and Culture

Note that the section this is taken from, on the evolution of the data processing culture, includes numerous illustrations and information inserts omitted here.

Roots (part two: COBOL)

With Flowmatic, the program that read the script telling the computer in what order to assemble the pre-made binary modules to produce a runnable program included the ability to transfer data values between elements, thus producing a compilation consisting of both code and data and warranting its own renaming as a compiler instead of an assembler.

Further work on Flowmatic formed the experimental basis for a 1952 paper by Dr. Hopper on compiler theory that heavily influenced a team, led by John Backus, at IBM's Watson Scientific Laboratory, that worked between 1954 and 1957 to develop FORTRAN (Formula translation) for use in scientific work.

BASIC
Six years later Backus was at Dartmouth college where he worked with two mathematicians, John Kemeny and Kenneth Kurtz, on a 1964 Fortran variant that simplified array management and dropped most variable declarations. It was intended for use by students not pursuing programs in math or science and known as the Beginners All purpose Symbolic Instruction Code, or BASIC.

Subsequent progress in the development and use of Flowmatic fell somewhat off the mainstream of computing research in the period largely because most of the hardware and software research was focused on numerical processing rather than automatic data processing. Thus most of the software research focused on languages like Fortran and most of the hardware research focused on developing and managing larger memory sub-systems and faster floating point processors.

When Remington Rand bought the Flowmatic project with its acquisition of the Eckert-Mauchly Computer Corporation, its executives inherited the UNIVAC line and with it a focus on dealing with naval logistics and related problems. As a result, Rand funded additional research on compilers and the problems associated with storing and manipulating characters, rather than numbers.

By 1957, the year in which IBM released the first disk drive, Flowmatic had developed to the point of commercial release, causing an immediate demand for its standardization among other commercial computer manufacturers eager to take advantage of the new tool. Because Flowmatic's development at Unisys had been funded as an unclassified project by the U.S. Navy it was theoretically in the public domain and various companies, including IBM, used this as part of their basis for laying claim to it.

That pressure resulted in a 1959 Conference on Data System Languages (CODASYL) at which IBM representative Bob Bemer and others succeeded in having the consensus version of Flowmatic made widely available and renamed Common Business Oriented Language, or COBOL.

As a programming language COBOL inherited all of its key characteristics from Flowmatic and the card deck management problem it addressed. Thus COBOL's designation as the primary language for commercial data processing directly perpetuated those ideas and later heavily influenced how people saw and managed commercial system design. This effect transcended both hardware and software since people thought in terms of COBOL and built both hardware and applications accordingly.

Since Flowmatic's fundamental design reflected the problem its designers were dealing with - handling naval logistics with the mechanical tabulators available to them- it focused on automating program flow in a step-wise process built around first sorting, and then tabulating, punch cards.

Thus the fundamental operations in COBOL are to attach a file or device, read from the file or device into memory, do something with the data now in memory, and write out the result.

Notice the co-evolution here: the assumptions made about programming in COBOL came from the 1940s Flowmatic effort to automate 1920s card batch management, and the structure of the resulting language then influenced generations of hardware and software designers and managers.

Among other effects, this meant that COBOL programs would tend to be much more I/O than CPU limited and implicitly created a need for temporary, "near memory," storage for data being held pending completion of an I/O operation - thus defining the architecture for today's most expensive mainframes.

COBOL was first codified in 1959; five years later, in 1964, IBM released the matching System 360 computer family and completely revolutionized the systems industry. Thus the 360 reflected in hardware the fundamental COBOL operations which, themselves, derived from the card sorting requirements of pre-digital data processing. Even then IBM was still hedging its bet on commercial data processing so the 360 was designed to be an "all round" processor which could be configured to excel at either scientific or commercial processing depending on customer needs, while maintaining backward instruction set compatibility with two previous, and mutually incompatible, IBM computer products:

A COBOL PRIMER -part 1
In 1985 a competent COBOL programmer was expected to produce about 15 lines of tested code per day.

Every COBOL program must have four hard divisions:

  1. An identification division
  2. An environment division
  3. A data division
  4. A procedure division

The examples used below come from the headers for those divisions in a COBOL66 program that ran just over 340,000 lines in nine major program pieces - each with these four main divisions. It was used to process Canadian health care claims and cost over $20,000,000 to develop in the early seventies.

The Identification Division specifies at least the program name and identifies the project manager or primary author for the code.

000100 IDENTIFICATION DIVISION.
000200 PROGRAM-ID. PCLMPROC1.
000300 AUTHOR. PAUL MURPHY.
000400 *

The environment division contains the file names and other local information that are later associated with the program in the JCL batch control for it. This may include the id for the computer it is intended to run on and typically has both configuration and I/O sections.

000500 ENVIRONMENT DIVISION.
000501 SOURCE-COMPUTER IBM-XXXXXX-3081.
000550 OBJECT-COMPUTER IBM-XXXXXX-3084Q.
000600 INPUT-OUTPUT SECTION.
000700 FILE-CONTROL.
000800 SELECT CLAIMS-01-FILE ASSIGN to "IN1".
000900 SELECT CLAIMS-11-FILE ASSIGN to "OUT1".
001000 *

Specifying the computer to be used at run-time may seem odd today, but in the OS/360 environment machines were custom installed and individual machines often differed significantly in terms of internals like default instruction sets or externals like the choice of EBCDIC print trains or device naming.

Many compilers therefore had "cross compile" capabilities in which a compile job run on one 360 architecture machine produced code that incorporated device names and other external JCL information of relevance only to the target run-time machine - another 360 architecture unit.

The data division defines the variables to be used and the space needed for each (defined separately in the Working Storage sub-section); and,

001100 DATA DIVISION.
001200 FILE SECTION.
001300 FD CLAIMS-01-FILE
001400 DATA RECORD IS RAW-CLAIM.
001500 01 RAW-CLAIM-HDR.
001600 03 REC-CODE-IN PIC X(3).
001700 03 DATA-CENTRE-NUM.
001800 05 DATA-CENTRE-UNINUM PIC 9(7).
001900 05 DATA-CENTRE-SEQNUM PIC 9(7).
002000 05 DATA-CENTRE-BATNUM PIC 9(7).
002100 03 PAYEE-NUM PIC X(5).
002200 03 PRACTITIONER-NUM PIC X(5).
002200 03 PRACTITIONER-NAME.
002300 05 PR-FIRST-NAME PIC X(12).
002400 05 PR-SECOND-INITIAL PIC X(1).
002500 05 PR-SURNAME PIC X(18).

The procedure division containing the actual application code.

010000 PROCEDURE DIVISION.
011000 0100 START-PROGRAM.
012000 OPEN INPUT CLAIMS-01-FILE,
012000 READ CLAIMS-01-FILE,

...which first connects files and claims memory

022000 MOVE ZEROS TO EOB-FLAG.
022001 SET BASE-RECORD-TMP TO NULL.
022000 MOVE SPACES TO PRACT-NAME-TMP.

... then initializes variables

101017 IF PRACTITIONER-ID-TMP = `1'
101018 MOVE `EOB' TO EOB-FLAG
101019 PERFORM PROCESS-AT-END-OF-BLOCK.

...and finally does something with them

... before eventually clearing memory and stopping

xxxxxxx STOP RUN.
xxxxxxx END PROGRAM.
xxxxxxx END PROGRAM.

The key lesson to learn about COBOL? Long, boring, strings of tabulating machine setup and operations.

---

Some notes:

  1. These excerpts don't include footnotes and most illustrations have been dropped as simply too hard to insert correctly. (The wordpress html "editor" as used here enables a limited html subset and is implemented to force frustrations like the CPM line delimiters from MS-DOS).

  2. The feedback I'm looking for is what you guys do best: call me on mistakes, add thoughts/corrections on stuff I've missed or gotten wrong, and generally help make the thing better.

    Notice that getting the facts right is particularly important for BIT - and that the length of the thing plus the complexity of the terminology and ideas introduced suggest that any explanatory anecdotes anyone may want to contribute could be valuable.

  3. When I make changes suggested in the comments, I make those changes only in the original, not in the excerpts reproduced here.


Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.