Perceptional Change and Cost Consequences

- by Paul Murphy -

One of the most common mistakes you hear people make is demonstated in the use of the word "platform" as a kind of shorthand that explicitly refers to hardware and software while implicitly proclaiming the general applicability of the speaker's expertise to many different hardware and software environments. In reality software environments are not interchangeable, the management methods and ideas applicable to one may, or may not, apply in other.

We all like to think we learn, but it's the unexpressed stuff we know for sure that tends to do us in because we never think about these things once we've learned them - meaning that the computing environment in which we first learn to be effective tends to determine our "headset" until some other, stronger, environmental force causes us to re-examine those the things we know for sure. As a result the certainties you bring to computing tasks are a better predictor of what it will cost your employer for you to succeed than the hardware or software you choose to work with.

Suppose, as an intentionally extreme example, that your organization creates about 250,000 records per day documenting medical fee for service claims. Each record contains about 96 fields including a provider id, a patient id, a service id, and the usual who, what, why, where, and when of the service. Your job will be to prepare weekly checque requisitions and claim reconciliations for the providers, spot possible fraudeluent or mistaken claims, and provide daily, weekly, and monthly reports showing total costs by provider and patient demographics like specialization, location, age, and accumulated claims frequencies. The files listing the fee schedule, provider authorizations, and provider memberships in consortia or partnerships are separately managed, but copied to you on a daily basis.

Now, before reading further, stop long enough to note your own instinctive "how to do this" reaction.

Did you think CICS/DB2 and COBOL? This took almost two million lines of COBOL66 in an IBM 370/158 environment in the early seventies. A lot of that, of course, was file management and a rewrite to DB2 in the mid eighties cut the number of lines almost in half - but the basic 24 hour report took 32 hours to run on a 3084Q.

Did you think in terms of a client-server set-up with interactive data cleaning, report management, and something like VB with SQL-Server as the main engine? It's a lot harder than it looks - and at least one organization that has tried to do this twice, first by combining OS/2 with mainframe DB2 and then with Windows 2000 everywhere, has long since announced its overwhelming success but is thought to be still running the COBOL/DB2 stuff behind the scenes.

Today the right answer is almost certainly to build everything around the power of associative arrays in PERL on Unix, with or without an RDBMS on the backend.

To people from the mainframe and Windows communities, PERL looks like a scripting language and is therefore not to be taken seriously as a programming environment, but that's a community perception based on assuming that what's known about scripting in those environments applies to Linux or any other Unix. It doesn't -and people have built everything from 3d games to web servers in PERL.

The other key, of course, is the effect of Moore's law on the cost/performance trade-off.

Something like:

while (<>) {
($prov_id,$pat_id,$serv_id,$day) = split(/[|\n]/, $_, 9999);
$fees{$serv_id} += $serv_id;
}

foreach $i (keys %fees) {
print $i, $fees{$i}*chr[$fees{$i}}];
}

which can stack several million records in a single array wasn't really practical until Tru64 and Solaris first broke the 64bit memory barrier - although the original OS/400 microcode database would have allowed something similar had enough RAM fit in the box.

Today, however, a machine as small as a $15,000 dual Opteron running Linux with a couple of US320 disks and 16GB of RAM can process the entire daily input file in about a minute - meaning that running twenty reports directly from the raw data takes about twenty minutes a day - and only part of an afternoon at month's end.

You never actually know whether something like this can be done in reality without first doing it, and I haven't; but my main point here isn't that you can write a medicare claims processor in under a thousand lines of PERL, but that how you think about any systems job is determined by what you think you know -and that people moving to Linux, or any other Unix, from Windows or the mainframe need to adopt new ideas and ways of thinking rather than treating it as just a cheaper platform from which to express their existing certainties.


Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 20-year veteran of the IT consulting industry.