Draft Blog Entries

% fortune -ae paul murphy

The meaning of "relational"

A lot of talkback contributors and others have muttered about the general failure of most application developers to make full and effective use of relational technology in their work.

My guess, however, is that not too many of these people would agree to the same set of assertions either about what relational technology is, or how it should really be used and firing off a discussion on these topics should therefore lead something enlightening, useful, and fun.

As my contribution to the opening salvos on this, I want to to start with some history and then draw an astonishing, but accurate, conclusion.

One of the things that drove early adoption of the IBM System 360 among data processing professionals was the belief that COBOL offered them an easy way to access the gain in operational flexibility that went with the change from mechanical to electronic card processing. In fact, however, the apparent simplicity created by COBOL's expression of very simple step by step instructions in a kind of pidgin English masks the near exponential increase in error potential as programs got longer and more complex:

010000 PROCEDURE DIVISION.
011000 0100 START-PROGRAM.
012000 OPEN INPUT CLAIMS-01-FILE,
012000 READ CLAIMS-01-FILE,
...
022000 MOVE ZEROS TO EOB-FLAG.
022001 SET BASE-RECORD-TMP TO NULL.
022000 MOVE SPACES TO PRACT-NAME-TMP.
...
101017 IF PRACTITIONER-ID-TMP = `1'
101018 MOVE `EOB' TO EOB-FLAG
101019 PERFORM PROCESS-AT-END-OF-BLOCK.
...

Looks pretty simple, right? But it isn't. This was part of a program that ran well over 340,000 lines - and whose functional equivalent I could write today in perhaps 100 lines of Perl.

The earliest 360s, however, were provided to customers complete with sample COBOL applications in key areas such as financial report generation (i.e. AR, AP, or GL batch processing) and customers were invited to improve on the code as needed.

And since it looked easy, hundreds of thousands of data processing experts decided they could do it, tens of thousands of projects were launched, and very nearly 100% of them became financial and operational failures.

One problem in particular, that of migrating data between batch jobs, defied customer attempts at solutions and what evolved, therefore, was what later became known as the stove pipe architecture with some data consistency within the jobs that, if executed in the right order, amounted to an application, but essentially none across such job sequences or "stove pipes."

IBM's response was the CICS/IMS combination demonstrating how customer information could be made available to more than one COBOL transaction at a time. First released as demonstration code in 1968, IMS is fundamentally a simple heirarchial database manager in which records have parent-child relationships with other records. That fit so well with both the 360 hardware environment (itself modelled on COBOL) and the stepwise thinking embedded in COBOL (itself derived from tabulator control automation efforts) that the combination almost instantly came to dominate data processing development and is, in fact, still in intensive use today.

The IMS/CICS combination didn't visibly improve project success rates, but it did teach generations of data processing professionals to think in terms of parent-child relationships when defining data layouts, and thus spawned what became the entity-relationship modelling industry stereotyped by Chen's 1976 ACM article: The entity-relationship model: Toward a unified view of data. (ACM Transactions on Database Systems, 1(1):9--36, 1976.)

At about that same time (1968), however, other people were taking a very different approach to the problem of applications design and data storage. Specifically IBM launched its future systems project that year and Dr. Codd and his colleagues were migrating ideas from science based computing to the electronic data processing arena dominated by COBOL and the System 360.

Their approach to breaking the applications deadlock brought on by the combination of COBOL's inflexibility and the profession's refusal to think outside the batch was to get rid of the entire stack of mechanical tabulator derived ideas. In their view, applications should be nothing more or less than means of collecting or viewing data.

That then became the basis for the future systems hardware (48 bit with a microcode System/R implementation), for its operating system (essentially CRUD controls for DML on the database microcode), its primary user accessible language (Report Generator language) and the decision not to support COBOL or any other tabulator replication functions (like JCL and partitioning) on the machine.

Unfortunately data processing customers shown the system in 1970 reacted in outraged horror -and, except for a couple of ex-IBM managers in Germany who designed their new enterprise application around these ideas, the whole thing was relegated to academia until Digital's announced intention to conquer commercial computing with the VAX allowed supporters within IBM to get the thing released, in 1979, as the System 38.

All of which brings us to one of the most widespread and enduring misunderstandings in business history.

Codd's data storage ideas were based on set theory - which in academic usage at the time was still widely known as the theory of relations (See, for example: Roland Fraise: The Theory of Relations (North-Holland publishing, NY and Amsterdam, 1941)) As a result he and his colleagues, all of whom had studied mathematics and most of whom had earned their PhDs from reputable universities during the forties and fifties, described their product as "relational."

Nobody on the science based side of computing had any problem with this and Codd's ideas were instantly and widely accepted. Indeed the first widely used relational database, Ingres, became standard on virtually every Unix machine in academic use by mid 1976.

Unfortunately "relational" meant parent-child hierarchies to the data processing professionals IBM sold to - and they knew everything there was to know on the subject already: after all, they worked with IMS and its slightly more generalised follower, IDMS, every day.

So what happened? When the System 38 was eventually introduced the experts reviled it, a few lunatics from the wrong side of the computing divide saw it as the perfect solution to a significant set of problems - leading to its rapid adoption in warehousing and related functions where its successors, the As/400 and iSeries, are still dominant, and SAP started to make sales of its mainframe product on the strength of demonstrations done on a System 38.

Meanwhile entity-relationship modelling eventually morphed into the excesses of the information engineering mania triggered by PC graphics, the disasters of data processing application development continued unabated, and a number of PC companies responded to the academic legitimacy accorded Codd's ideas by labelling their (non relational) products "relational" as a sales ploy -thereby teaching a generation of PC users everything they needed to know about relational database use too.

Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specialising in Unix and Unix-related management issues.