% fortune -ae paul murphy

From Chapter one: Data Processing and the IBM Mainframe

This is the 8th excerpt from the second book in the Defen series: BIT: Business Information Technology: Foundations, Infrastructure, and Culture

Note that the section this is taken from, on the evolution of the data processing culture, includes numerous illustrations and note tables omitted here.

Roots (Part Three: The System 360)

--- As early as the mid sixties people who had used IBM or other code to get basic GL and Payroll systems running were trying to get such automated interfaces between them to work - but that meant sharing data and, more importantly, data definitions, between two or more systems. That, in turn, meant trying to get people who were certain they already knew what terms like "bed" or "voucher" meant, to understand that they actually didn't know.

In practice therefore integration was achieved mainly by continuing batch scheduling practices from the 1920s rather than through data management or software. In this approach jobs are separate but the programmer building the second job knows the format of the file output by the first. Although the output format has nothing to do with naming standards, definitions, or data flow, this information is sufficient to allow the programmer to read data from the file, assign his own names and structures, and so develop something that works. As long as the second guy's output format is documented, the programmer for the third step can do much the same, and so on.

At run time the dependencies are therefore handled by the scheduler- since job 1 has to run before job 2 and so on - just as they would be by program flow in an integrated application.

When such systems are built independently despite using much of the same data, each system has to develop and manage its own data files. Over time these became known as "data silos" and diagrams showing many of these in use at one data center later became known as "stove pipe diagrams."

"De-stovepiping" therefore became the in-word for the process of trying to integrate multiple applications. That process generally had two contradictory components:

  1. Since successful daily operations depended mainly on the scheduler and the correct sequential execution of batches, management imposed ever tighter hierarchal controls on specifications, requirements, programming, and testing; while,

  2. The people trying to achieve systems integration started to use multiple diagramming or "data modeling" techniques to sell change.

The simplest diagramming method, called data flow diagramming, consisted of tracking changes to data as it moved from source card to printout.

As such it reflected standards of usage from its original development during the nineteen twenties to document process flows in card based data processing.

Data Flow Modeling can be done at both data source (generally meaning document) and data element (meaning specific item) levels. The picture above, for example, is a trivial diagram showing three of the major data files opened at the beginning of an accounts payable application. At the detail level this would be couched in terms of PO line items and items received rather than the PO and receiving reports.

In contrast, Entity-relationship diagramming originated with the parent - child relationships in hierarchal and network databases.

As a result an E-R diagram, like this one, originally showed where and how attributes (data) fit in the hierarchy. Since this fit well with CICS/IMS, COBOL paragraphing, and so-called "structured programming," it quickly became a critical component of mainframe professional practice.

In both cases proponents of the methodologies that evolved around the tools developed those tools further. E-R modeling, in particular, grew in commercial importance during the seventies and early eighties to the point that most major consultancies and computer companies established their own private versions and promoted these as conveyors of competitive advantage.

Diagramming was a natural use for the personal computer - in fact when the Apple Lisa was released to developers in 1982 it came with a sample data flow diagramming application as part of the LISAtools set.

Three years later, when the IBM PC became somewhat graphics capable, entire professional specializations developed around attempts to use it to automate not just diagramming, but the link between diagrams and usable, particularly IMS data definition language, code.

The most commercially successful of these methodologies involved a variation called "information engineering." This had nothing to do with engineering, but tried to use the PC to draw, store, and link models of complex applications and auto-generate some of the COBOL code needed to implement the models using database products like IMS. ---

Some notes:

  1. These excerpts don't include footnotes and most illustrations have been dropped as simply too hard to insert correctly. (The wordpress html "editor" as used here enables a limited html subset and is implemented to force frustrations like the CPM line delimiters from MS-DOS).

  2. The feedback I'm looking for is what you guys do best: call me on mistakes, add thoughts/corrections on stuff I've missed or gotten wrong, and generally help make the thing better.

    Notice that getting the facts right is particularly important for BIT - and that the length of the thing plus the complexity of the terminology and ideas introduced suggest that any explanatory anecdotes anyone may want to contribute could be valuable.

  3. When I make changes suggested in the comments, I make those changes only in the original, not in the excerpts reproduced here.


Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.