Draft Blog Entries

% fortune -ae paul murphy

Mainframe Linux

Here's a bit from the introduction to a recent article by Ken Milberg under the title "Mainframe Linux vs. Unix".

Today's new breed of smaller, cheaper mainframes, paired with the Linux operating system, look like an attractive alternative to Unix on RISC or SPARC servers. Linux on the mainframe seems to give us the best of all worlds: the dependability and resilience of over 40 years of hardware innovation and a flexible, reliable open source operating system. The big question: When should companies choose Linux mainframes over Unix?
This article compares the features and performance of Linux on the mainframe -- in this case, the IBM System z Server -- and compares it with Unix, in terms of its availability, features and performance.
From a performance standpoint, the mainframe has a number of characteristics that are not as prevalent for its mid-range (Unix) brethren. They include:
Dependable single-thread performance. This is essential for optimum performance and operations against a database.
Maximum I/O connectivity. Mainframes excel at providing for huge disk farms.
Maximum I/O bandwidth. Essentially, connections between drives and processors have few choke-points.
Reliability. Mainframes allow for "graceful degradation" and service while the system is actually running.
...
We've discussed some of the benefits of the mainframe, but why Linux?
Standardization
Many companies are already running Linux on distributed platforms. For those that already do, in addition to having IBM mainframes running centralized applications, using Linux on the mainframe becomes a natural evolutionary step for their business' mission-critical applications. Virtually any application that runs Linux on Wintel computers will run on System z, with only a simple recompile. This solution provides the organization with a corporate-wide Linux adoption policy.
Consolidation
Many distributed Unix and/or Linux servers can be consolidated onto one System z machine, which leads to substantial cost savings. For example, if a company has a server farm of 200 distributed servers, it can be easily be consolidated into either one or two System boxes, hosting 60-70 Linux servers in a high-availability environment that can scale.

Notice that "decentralized" means "not on the mainframe", that the comment about predictable single thread processing falsely implies that Unix can't do this, that the I/O capability actually consists of the ability to connect many ESCON devices -small, SCSI-1 era, drives that are individually slow and limited - and that the reliability claim applies to zVM running on the hardware, not the Linux kernel running in the VM instance.

Lets leave that, however, to consider these extracts from an IBM puff piece headed CALCULO boosts DB2 performance with Linux on System z, celebrating the successful conversion of a DB2 application from zOS to zVM/Linux:

CALCULO S.A. is an IT services company based in Madrid, Spain. As a leading provider of solutions and outsourcing services to the insurance sector, CALCULO relies on continual investment in research and development to keep itself at the forefront of the industry
Business need: Constraints on database size with its existing DB2 for z/VM platform meant that CALCULO was running out of data capacity, and would soon be unable to deal with business growth. CALCULO wanted a way to improve database performance and increase capacity without moving away from the highly secure and reliable IBM mainframe platform.
Solution: CALCULO challenged IBM to provide a proof-of-concept for running the DB2 database under Linux on the System z platform. Following a successful project, CALCULO implemented an IBM z890 mainframe with an Integrated Facility for Linux engine, and set up two z/VM virtual machines? one to run the company?s core business application under VM, the other to run DB2 under Linux.
Benefits: Maximum database capacity is considerably increased, eliminating the restrictions on business growth; 90 per cent improvement in database loading times; 80 per cent speed increase in restoring from backups; speed of extraction, indexing and calculation increased by 75 per cent; performance improvements should significantly reduce offline time, increasing productivity.

Since both sources argue that mainframe Linux offers tremendous operational savings by combining open source with legendary mainframe reliability and performance, the obvious question is to what extent, if any, we can apply these conclusions to our own decision making.

On that, lets start with another bit from the Calculo piece - this one revealing something about the configurations used:

IBM gave CALCULO two options ? moving to DB2 for z/OS or running DB2 under Linux on an Integrated Facility for Linux (IFL) engine in a new IBM z890 mainframe, replacing two older Multiprise 3000 H30 servers.
...
?It used to take between eight and ten hours to reorganise the huge tables in our DB2 database ? during which time, the system was effectively offline and nobody could do any work,? says Raul Barón. ?With the z890, it only takes about one hour, which is by comparison a negligible interruption.
?Similarly, although backups only took a couple of hours, restoring the data took ten hours, which was a drain on productivity. The new system should be able to cut this to just over two hours.?

These stories are intended to make Linux on zVM sound impressive - but both rely on a critical assumption: specifically that the reader knows little or nothing about relative system costs and performance.

In the Calculo case, for example, the gain from eight to ten hours on the Multiprise to only about an hour on the z890 works well with Baron's reference to "huge tables" to give the impression that the conversion demonstrated impressive gains on a big job - but that only happens because most of us don't know anything about the machines involved.

The Multiprise H30 was a mini-390 maxing out at 60 IBM "MIPS" on ESCON (roughly comparable to SCSI) controllers and 1GB of RAM - i.e. it offered roughly the performance of a 200Mhz Dell Pentium Pro. The end point is five years better: a 2004 z890 would max out at four processors offering up to about 300 IBM "MIPS" each (Power4 generation at 600 or 750Mhz), 8GB of memory, and FICON connectors - offering about the same throughput as a four way 1.4Ghz Dell Xeon with a first generation fiber channel controller.

This may seem exaggerated - equating a mainframe to an older Dell? - but check out a 2003 IBM Redbook on Performance Tuning Linux for the zSeries in which the authors show that it's actually possible to get your Linux page swap rate on a dedicated 600Mhx zSeries "engine" all the way up to 40MBS - to a RAM disk! - and then provide a lot of guidance on getting your page read rate - on eight striped disks hooked up via four fiber channels!- up to a whopping 120MB/S.

That's barely Pentium II/SCSI performance and pathetic enough, but then consider the dollars. According to the tech news site the Multiprise started at about $135,000 U.S. plus about $1,080 per month in maintenance; the z890 cost ranges from about $240,000 U.S. (plus $1,500/mth for maintenance) for the base model to $1.6 million (plus $20,000/mth) for the top end version. Costs for Red Hat Linux for the mainframe IFL were, I believe, around $40,000 per license per year when the conversion started - SuSe has consistently been cheaper and now runs around $12,000 per IFL per year.

So what did they get for the money? Not performance - a few thousand bucks for a dual 3.2Ghz Xeon with 16GB and a 4GB FC controller would more than double the z890 on throughput. And not reliability either: the z890's RAS features aren't accessible from Linux - meaning that failures cause a Linux reboot even if the underlying hardware continues to operate.

But if the example IBM uses on its brag site for zVM/Linux is somewhat questionable, what about the two arguments Milberg makes in his article?

First, he says that the mainframe is fast, reliable and easy to use, and then that Linux offers standardization and consolidation opportunities on the mainframe.

Fast it's not - and the mainframe is reliable but Linux instances on the mainframe tend to require a lot of restarts largely because of the unique hardware environment, the high-end/low end swap during re-compile, and the fact that the mainframer's every administrative instinct mitigates against managing Linux effectively.

"Ease of use" is, of course, in the eye of the beholder - but I personally doubt that you can find an actual zVM user with less than ten years of "progressively more senior" commitment to it who agrees that it's easy to use.

Worse, his standardization claim is expressed in this statement: "virtually any application that runs Linux on Wintel computers will run on System z, with only a simple recompile" and that's simply not true.

Spend some time reviewing the thousands of user questions on the Maris zVM/Linux mutual support site and you'll be struck both by the appalling lack of knowledge demonstrated by many of the questioners and by the obvious complexity of the problems they face making Linux work within the zSeries environment.

Consider, for example, this (more or less randomly selected) exchange between an apparent beginner and an accepted expert:

> So, if I'm understanding this correctly, taking a backup of a running Linux system from another LPAR gives you, at best, an unreliable backup.
>
That's certainly how I read it.
>
> That means that there are only two viable alternatives:
>
> Shut down Linux and do the backup from another LPAR or,
>
Yes. The plus is that you can then restore your Linux environment the same way that you restore the z/OS or z/VM environment. Also, you can manage your tapes using your standard tape management software (which doesn't exist at all on Linux, as I understand it). The minus is unavailability of the Linux system during this time (which is shorted by some sort of "snap shot", if you have that capability) and well as it being an "all or nothing" DASD level backup / restore, which is not useful for restoring individual files.
>
> Use a backup client that runs within Linux and therefore participates in its file system processing, getting all the current and correct data for the backup.
>
Correct. But, again, Linux does not interface to the "normal" tape management systems used by other System z operating systems.
>
> Is that about it?
>
> The problem, as I see it, with backing up from another LPAR is that there is no incremental or differential backup capability. Nor is there any selective restore capability. Its an all-or-nothing backup/restore.
>
Yea.

His second argument is that the mainframe lets you consolidate easily. Here's a repeat of what he says:

Many distributed Unix and/or Linux servers can be consolidated onto one System z machine, which leads to substantial cost savings. For example, if a company has a server farm of 200 distributed servers, it can be easily be consolidated into either one or two System boxes, hosting 60-70 Linux servers in a high-availability environment that can scale.

Look past the internal contradictions (200 distributed servers in one place can be consolidated to two system boxes emulating 60 to 70 servers each) and he entirely misses the practical problem: the cumulative network bandwidth on those 200 servers, even if they're a few years old, exceeds the maximum configurable bandwidth on IBM's biggest mainframe, the 2094-754, by at least an order of magnitude.

In other words, even if average utilization is sufficiently low to allow the zSeries machine to handle the load (i.e. under 1% per x86 server replaced) users will be queued up for network access nearly all of the time.

What's really going on with mainframe Linux illustrates a big part of the problem with data processing. When data processing started in the 1920s its machines were expensive, people were cheap, and all processing was done well after the transactions recorded were completed - and all of that continued when data processing made the jump to digital card imagery and COBOL processing in the 1960s. As a result the key metric evolved in the 1920s and 30s, utilization, continued to drive management decision making - and the argument today is just what it was then: if you're going to spend over $22 million for an IBM 2094 tabulator, you'd better run the thing flat out 24 x 7.

Interactive users didn't exist when this evolved but we do have them today - and what they want is immediate response: meaning lots of resources available on demand.

To give users the fastest possible response when they want it, you have to configure your gear so that there is a very high probability that the resources needed are available when the user needs them - meaning that the smaller the machine, the lower its average utilization has to be for you to meet your user mandate.

And that's the obvious bottom line on mainframe Linux: the rationale for it is entirely based on managing to a metric whose operation produces exactly the opposite of what users want.

Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.