Linux vs. Unix

Questioning IBM

This is a draft for my LinuxWorld.com series. Please do not copy or distribute this without the permission of Linuxworld.com.

DECK

This is the first of three articles in which Paul Murphy takes a close hard look at running Linux on the mainframe. In this one he fills in the technical background and looks at the probable price/performance of the system relative to Linux on x86 and Solaris on SPARC.

In the second article he will look at what Linux on the mainframe may mean for the Linux marketplace and in the third he'll look at what IBM should be doing to make the case for Mainframe Linux.

Note: My plan called for Linuxworld.com to invite IBM to write a fourth article in this series putting forward their answers to my comments but the editor wanted to include IBM responses right in the first article.

Exploring IBM mainframe Linux

A few weeks ago I was asked what I thought of using Linux on the mainframe. The truth was that I'd simply never really thought about it all, just assumed that it made sense on the basis of IBM's reputation and my own experience with MVS/XA. Forced to think about it, I realized that my MVS experience is 16 years out of date and that my perception then that the 3084Q we were running on was fast relative to my workstation (a Sun 160 with a 16MHZ MC68020 and 2MB of RAM), was coloring my reactions now.

What's in an acronym?

"HGOS" could be used as an acronym for "Hosted Guest Operating System" but (apparently to the dismay of VM experts everywhere) I'd like to use "GHOST." There are two reasons:

it lets me perpetrate a truly horrible pun in the second article; and,
I think it really is the right word. Think "ghosted" to describe virtual Linux operating systems run in a remote hosting context, or "ghostly" to describe memory management --Linux system memory and other resources outside the VM working set can be seen and measured, but don't necessarily have physical existence.

Although it's not really possible to set aside previous experience and knowledge to fully evaluate something on its merits only at the time of the evaluation, the questions raised by the IBM move to offer Linux as a guest operating system on the zSeries are the same ones we should ask about any new Linux product. These amount to: "Why should anyone buy this?" and break down as:

What's in an acronym?
"HGOS" could be used as an acronym for "Hosted Guest Operating System" but (apparently to the dismay of VM experts everywhere) I'd like to use "GHOST." There are two reasons: it lets me perpetrate a truly horrible pun in the second article; and, I think it really is the right word. Think "ghosted" to describe virtual Linux operating systems run in a remote hosting context, or "ghostly" to describe memory management --Linux system memory and other resources outside the VM working set can be seen and measured, but don't necessarily have physical existence.

What is it?
What is it for?
What's the hardware?
What does it cost?
How well does it work?
How does it compare to the alternatives?

As it turns out, the IBM product set is not cost effective relative to other Unix options such as Linux on Intel or Solaris on SPARC, but this doesn't mean no-one should buy into it. As next week's article tries to make clear, there are people who might be well advised to buy into this technology despite its costs and limitations.

Mainframe Linux: What is it?

As of Feb 26/02 IBM said:

Linux for zSeries will support the new 64 bit architecture in real and virtual mode on zSeries servers. The Linux code to exploit the 64 bit architecture will be available from the IBM developerWorks web site at a later date. Linux for S/390, currently available on G5, G6 and Multiprise 3000 processors, will be able to execute on zSeries servers in 31 bit mode.

In brief, one of the standard versions of Linux is compiled for the zSeries and loaded as a guest operating system under VM where it resides on one or more real or memory resident mini-disks (IBMese for logical volume) and acts for all intents and purposes like a real machine running the selected Linux system.

Some Background

In the earliest days of commercial data processing, jobs were entered into the computer on decks of punched cards. In preparing these, the user started with a bare hardware systems model and so the first set of cards in a card box would define the resources to be used by the machine -including things like the hex addresses for the memory range and I/O devices to be used. The next set of cards then had the program, followed by the data, and eventually the fourth set of cards had the job end instructions freeing resources again. Although today's JCL [Job Control Language] evolved from the control statements section of this, CP/40 [Control Program] originated as an attempt to create a set of virtual machines predefining resources that could be interactively assigned to jobs on the logical, rather than physical, level.

Out of this developed CP/67 and its followup - CP/VM [Control Program Virtual Machine] or just VM. (For lots of interesting detail on the history and the conflicts between the batch oriented majority within IBM and the attempts to create an interactive environment that led to VM/CMS see VM and the VM Community: Past, Present, and Future by Melinda Varian.)

The VM/CP combination doesn't operate directly on the hardware: both exploit an integrated microcode component called the Processor Resource/System Manager (PR/SM) which handles things like basic system resource partitioning and operates a bit like a PC BIOS.

A fully configured zSeries can be partitioned into no more than 15 logical partitions [LPARs] in this way although further micro-partitioning is possible within the CP/VM environment. Since each such LPAR is independent of all others, it can run VM or any other OS, including Linux, separately although each remains dependent on the underlying hardware and microcode.

It is possible to run Linux as a single operating system controlling the entire z800 processor. This machine starts at about $250,000 and has a maximum of four system CPUs plus one control and storage assist processor. Running Linux in this way turns the mainframe into a simple four-way box with some enhanced reliability and performance characteristics.
This approach is largely ignored here both because it isn't really different from running Linux on any other four-way box and because the major benefits IBM advertises for mainframe Linux generally derive from VM's ability to switch between multiple Linux GHOSTs on the same machine.
Amdahl's UTS product provided native System V Unix on the 470 mainframe as early as the mid eighties but the highly interactive nature of Unix conflicted with mainframe design to make it a very poor performance bet relative to running BSD on Vaxen. For some special purposes, however, this didn't matter and it is, I believe, still in use.

It is possible to dispense with VM entirely either across the whole machine or only in one or more LPARs. Without VM an LPAR, whether that has one engine ("engine" is an IBM term denoting a CPU and the I/O hardware it's embedded in) or all of them, can only run one instance of whatever operating system is booted from CP -making it possible to run Linux in near-native mode on a dedicated LPAR or machine. Similarly, the fact that the CP/VM combination handles all resource allocation means that you can use VM to share resources among guest operating systems or to establish and maintain communication pathways between them.

What is it For?

In an FAQ accompanying its January 25/02 announcement of an entry level configuration (estimated at $250,000 with one Linux engine enabled) IBM described the target market for the machine as:

The IBM zSeries Offering for Linux is mainly targeted to server consolidation workloads of 20 to many hundreds of servers. The offering is designed from the ground up for server consolidation giving you unparalleled Total Cost of Ownership through consolidation of UNIX, Windows NT and Linux applications to Linux on zSeries.
Furthermore, it is an excellent application development platform for large customers or Independent Software Vendors (ISVs) requiring a 64-bit target platform. It provides an ideal lower-entry-price, new workload platform for customers who want the qualities of service provided by zSeries processors.

Note that most IBM mainframes are shipped with the maximum number of CPUs and amount of memory for that line pre-installed. These resources are then licensed for use as needed to tune the hardware to the specific workload to which it is applied.

According to IBM advertising and whitepapers the five most important benefits offered by this approach are:

The high reliability associated with the IBM hardware.
The isolation of each Linux instance from all others during execution.
Easy access to opensource and internet related applications.
Power savings from running one machine instead of many.
The speed, ease, and low cost of internal networking between Linux instances.

In addition most published references to the IBM mainframe also have the words "high performance" or something very similar in, or very near, the same sentence. This implies that mainframe use confers an additional benefit: access to very high levels of throughput.

What's the Hardware?

The basic zSeries hardware consists of a single integrated CPU board [known as an MCM] with 20 on board CPU pairs, up to 64GB of RAM, and 24 1GB/Sec full duplex I/O ports. Both processors in each pair execute code, if the results fail to match, that pair is taken off-line and the spare pair switched in. By default the board is structured to have up to 16 central processors [CP] in a pair of tightly coupled 8-way SMP configurations, three System Assist Processors [SAP] which direct I/O, and one spare. Current generation processors run at about 770MHZ in 64bit mode.

Each I/O port can be multiplexed four ways to produce a total of 96 I/O connection points. For disk related I/O these are now usually FICON [IBM Fiber channel] connections to independently memory buffered disk arrays and operate at industry standard rates. According to FICON and FICON Express Channel Performance Version 1.0 by Cronin et al (IBM, Poughkeepsie, February, 2002):

In addition to its bandwidth improvements, the zSeries 900 native FICON Express channel also improves the number of 4K byte operations/sec that can be processed. If a single native FICON Express channel is connected via a native FICON director to two different native FICON Shark CU ports, it can process up to 7200 IO/sec as shown in Figure 5 above. With 96 FICON and 160 ESCON, a z900 could theoretically drive a peak of over 800,000 4K I/O operations per second.
[Page 14]

Note that FICON controllers have 333MHZ PPC processors providing what amounts to a DMA service without interrupting main processing.

The introduction to the RedBook on Linux on IBM zSeries and S/390: ISP/ASP Solutions by Michael MacIsaac, Peter Chu, et al, contains an excellent overview of the hardware. Thus a a fully configured zSeries machine offers:

up to 16 SMP capable CPUs running at about 770MHZ;
up to 16MB of coherent cache for each of two 8 CPU blocks;
up to three 770MHZ dedicated I/O processors
up to 96 FICON channels each with a 333MHZ CPU;
up to 1.5GB/Sec in per processor memory bandwidth;
up to 24GB/Sec in overall system bandwidth;
up to 3.125GB/Sec instantaneous bandwidth to external DASD [IBMese for "disk"].

The hardware architecture and the microcode supporting it are heavily optimized for processing batch transactions. In these:

control and resource allocations are set at the beginning of the run and do not change during it (in effect loading and running a minimal OS tailored specifically to that application);
variable application code and transaction data sets are quite small;
each transaction is likely to proceed in stages interrupted by database calls or other I/O processing; and
individual transactions are independent.

As a result the main CPU should have ready access to co-processors to handle I/O, should have enough on-board cache to hold the main instruction loop and a small data set, should have very fast access to the information returned from database calls, and multiple CPUs should share the same external cache to avoid having to to tie transactions to specific processors and thus incur wait states if that processor is busy on return from an I/O call.

IBM pioneered what we know think of as interactive on-line transactions processing using the CICS/IMS combination in the late sixties but how this works is very different from what you may be used to with Unix. When a PC boots up Linux and a user logs in, there may be as many as 60 to 90 running processes supporting that activity. Add apache/tomcat with mod_perl or PHP and a database to support on-line transactions, and the CPU could be switching between 140 or more concurrent processes.
In mainframe on-line work the transactions processor loads as a batch job - it just runs continuously getting input from a pre-processor rather than a file - and is usually the only thing running in that LPAR. (Note that this applies to so called "production processing" under zOS, not to VM's batch facility. When the latter runs, it acts more like a shell script than a traditional batch and can share resources with other VM processes.)
For a (1070 page) introduction to batch processing in the CICS/DB2 environment see:
Jim Gray and Andreas Reuter, Transaction Processing Concepts and Techniques; Morgan Kaufman Publishers, San Francisco, 1993.

For a quick introduction to the look and feel of this technology today, check out a February 2002 IBM Redbook by Andrea Consett et al on using IBM VisualAge Cobol with CICS and DB2.
That RedBook is predominantly about using the editor and related facilities but it includes an example (Page 135) of what JCL [Job Control language] statements look like after almost 40 years of of progressive refinement:
Here is a simple compile job for a batch program. Except for the fact that for a DB2 program there is an additional precompile step, this compile JCL also applies to compiling a DB2 program, including a DB2 Stored Procedure. See 9.5, "Preparing your program for debugging" on page 104 for more information about compiling a program for test.

CONZETTC JOB (999,POK),NOTIFY=CONZETT, CLASS=A,MSGCLASS=X,MSGLEVEL=(1,1),TIME=1440 COB EXEC PGM=IGYCRCTL, PARM='QUOTE,LIB,OBJECT,XREF,RENT,TEST' STEPLIB DD DSNAME=IGY.V2R2M0.SIGYCOMP,DISP=SHR SYSLIB DD DSNAME=CONZETT.AJC.COPY,DISP=SHR SYSIN DD DSNAME=CONZETT.AJC.COBOL(HELLO),DISP=SHR SYSPRINT DD DSNAME=CONZETT.AJC.LISTING(HELLO),DISP=SHR SYSLIN DD DSNAME=&&LOADSET,UNIT=SYSDA, DISP=(MOD,PASS),SPACE=(TRK,(3,3)), DCB=(BLKSIZE=3200) SYSUT1 DD UNIT=SYSDA,SPACE=(CYL,(1,1)) SYSUT2 DD UNIT=SYSDA,SPACE=(CYL,(1,1)) SYSUT3 DD UNIT=SYSDA,SPACE=(CYL,(1,1)) SYSUT4 DD UNIT=SYSDA,SPACE=(CYL,(1,1)) SYSUT5 DD UNIT=SYSDA,SPACE=(CYL,(1,1)) SYSUT6 DD UNIT=SYSDA,SPACE=(CYL,(1,1)) SYSUT7 DD UNIT=SYSDA,SPACE=(CYL,(1,1)) LKED EXEC PGM=HEWL,COND=(8,LT,COB),REGION=4096K SYSLIB DD DSNAME=CEE.SCEELKED,DISP=SHR DD DSNAME=CONZETT.AJC.LOAD,DISP=SHR DD DSNAME=SYS1.LINKLIB,DISP=SHR SYSPRINT DD SYSOUT=* SYSLIN DD DSNAME=&&LOADSET,DISP=(OLD,DELETE) DD DDNAME=SYSIN SYSLMOD DD DSNAME=CONZETT.AJC.LOAD(HELLO),DISP=SHR SYSUT1 DD UNIT=SYSDA,SPACE=(TRK,(10,10)) LKED.SYSIN DD * ENTRY HELLO NAME HELLO(R) /*

Although the zSeries achieves a near perfect balance within this set of requirements the hardware cannot, by itself, be tailored to specific applications -only to a generic class of applications. The refinement needed to further tune the machine to its workload is implemented in licensing, not hardware or microcode. Systems are shipped with the maximum number of CPUs and amount of memory pre-installed, and then tuned to the workload by adjusting the number of processors, or amount of memory, licensed for actual use.

What does it Cost?

Pricing information is both difficult to obtain and quite different from normal Unix pricing structures. When you buy a Unix machine from someone like Sun or Dell, the OS is part of the package, not something you lease separately. In contrast a fully configured zSeries, estimated at around $4.8 million, usually needs separate monthly operating systems and related software licensing that can add considerably to the total cost of the system.

On smaller machines some licensing does devolve to the model most people are used to. For example, IBM offers zVM, for the z800 only, on a perpetual license for a rumored $45,000 per engine (with a maximum of four engines per system) and only $11,000 in per license annual maintenance.

A sales document by Sytek Services (an IBM mainframe reseller) offers some cost information for Linux on the mainframe with tabulations showing an estimated $22,031 in monthly software license fees, $26,000 for the initial TurboLinux setup, (SuSe is rumored to run about $11,000 per engine) and $3,200 a month for Linux support - all for running Linux in one partition on an entry level machine.

A tech-news site offers precise list price and configuration information for the basic "raw iron". They show that:

A z800 with one fully configured Linux engine starts at about $410,000 plus $4,822 per month
A z800 with the maximum four fully configured Linux engines starts at $1,470,000 plus $12,861 per month
A 10-way z900 starts at about $3,483,000 plus $39,641 per month
a 16-way z900 starts at about $4,851,000 plus $46,692 per month

Assuming these costs are authoritative - and tech-news is widely respected within the Mainframe community- they would represent list prices before rumored average 30% discounts but also without the additional disk and licensed software resources normally needed to run this type of gear in an enterprise data center. Real system costs are likely to net out marginally higher because few customers buy stripped down systems without significant additional software.

How well does it work?

Benchmarks

IBM does not seem to publish much zSeries performance information. The Redbook on running Linux in an ASP mode quoted above has extensive comments on benchmarking all of which I believe can be summarized as saying that IBM does not do publically audited benchmarks because there are no benchmarks which reflect the mainframe's strength.

Of course, the question is whether this should be of concern to major benchmark management organizations like the Transactions Processing Council [tpc.org] and the Standard Performance Evaluation Corporation (spec.org) or to IBM and its customers.

In any case, as of Feb 25/02, I could not find any audited benchmark results for the zSeries, or releases of the S/390 (its immediate predecessor) subsequent to about 1998 -when the S/390 lost badly to early versions of the Sun 10EK - listed among the benchmark reports offered by SAP, Oracle, Peoplesoft, SPEC, or TPC.

In searching for applicable benchmark information using google I came across an analysis by David Boyes which is frequently cited in lieu of a benchmark and three candidate benchmarks:

one on sendmail;
one on Lotus Domino; and,
one on a financial system written in GT.M.

The Boyes Report

This analysis has been widely reported and often quoted (just do a google search using "test plan Charlie" "David Boyes" -quotes as shown!- for a long list of press and other citations) in support of the claim that the mainframe can run thousands of concurrent Linux ghosts.

Three quotations from the report tell the story:

Boyes selected a simple application - presenting a static page via the Apache web server - as a good test case that, for documentation purposes, could be quickly constructed and instrumented. The LPAR available to Boyes consisted of two CPUs from a G5-class System/390 along with 128MB of central storage and a recently acquired EMC disk unit that had not yet been placed into service and was available to be dedicated to the LPAR for testing purposes ...

During this phase, Boyes also developed some short REXX execs to duplicate and customize a Linux instance. He discovered that creating and configuring a new Linux instance from one of the master copies involved no more than two commands and about 90 seconds to duplicate and configure the system. This code was later used as the core of the production solution...

Finally, Test Plan Charlie, the "let's push it until it falls apart" test, was created to gauge the upper limits of the solution. Charlie began at 5 p.m. on a Friday; by midnight Saturday, it had reached 41,400 servers and it had run out of resources on the System/390 LPAR. While the system did not crash, it was unable to create new servers due to lack of resources.

Note that the Redbook Linux on IBM zSeries and S/390: ISP/ASP Solutions: Create and maintain hundreds of virtual Linux images shows, on page 214, that it takes about 30 seconds just to copy a 250 cylinder mini-disk, so the 90 seconds is much more credible than the 2.69 second average (=111,600/41,400) implied.

I have problems with this. He says, for example, that it takes about 90 seconds to create a Linux ghost but then claims to have created 41,400 of them in the 111,600 seconds from 5PM Friday to midnight Saturday? I don't understand this: at 90 seconds each, 41,400 instances should have taken 43 days to create if each new instance added absolutely nothing to system load. If he created them in parallel he would have had to be running an average of 33.45 creation streams for a continuous write rate (at 70MB each) of about 269MB/Sec before any other system activities - like running the already created instances.

It probably is possible to get 41,400 instances on a small machine -if you have all of them share essentially everything, don't load a separate and individually complete working set for each ghost, and don't connect each instance to an external network. In my opinion, however, this deserves all the credibility of a claim that my ability to run 10,000 concurrent "sleep 7200" processes on a Sun Model 80 workstation proves it can support 10,000 concurrent users.
Mr. Boyes appears to be careful with his wording but, from a community perspective, the problem is that his experiment is widely presented as realistic. Consider, for example, this from the header on an interview he did with LinuxPlanet.com:

Anyone who works with Linux on IBM's System/390 mainframes has certainly heard of David Boyes. He made history early in the project by running no less than 41,400 Linux images on a single mainframe, all of them doing real work under simulated load as web servers.

or, nicely combining two bits of utter nonsense in one citation, this from a column in the San Francisco Chronicle:

To back up its claims, IBM steered me to David Boyes of Sine Nomine Associates, a networking systems consultancy in Ashburn, Va. Boyes recently helped a major East Coast telecommunications company install a new S/390, which can host up to 41,400 separate "virtual servers" running Linux. Before settling on the mainframe, Boyes and his customer considered using big Sun Microsystems servers instead, but they figured they would have needed about 750 of them, filling more than eight times as much expensive data-center space, to get equivalent computing power.

Similarly, if you check out IBM's compilation of his video clips and related materials you'll be able to hear and watch him say that each instance is a fully functional, separate, ghost system and only cynical and suspicious people like me will notice that he says this about ghosts in general, but not specifically about the 41,400 he ran.
You'll also hear him make statements about Unix that I don't think are true. For example he repeatedly claims that resource management either does not work or does not exist in Unix. In reality user resource limits are consistent with basic Unix philosphies and have been available at least since BSD 4.x in the early eighties. Stronger allocation tools are somewhat inconsistent with core system beliefs but are often commercially required and so available from all major vendors: Sun offers Solaris Resource Manager, HP has both a Workload Manager and a Process Resource Manager, AIX supports Workload Management, and Tru64 implements these functions as Class Manager.

There are many other things I don't grok either. For example, I don't understand how he could multiplex 41,400 apache instances into available TCP/IP resources without dropping performance below that of two cans connected with a bit of string.

My main question, however, is how he got 41,400 instances to fit into a 128MB machine. The problem here is that the default Linux interrupt timer runs 100 times per second so VM would have to page in the ghost, start it running (that's over 50 processes for a typical Linux instance running Apache), process the interrupt, and then either page it out or do whatever work is queued up; and do all this 100 times per second per instance. If a minimal working set for the Linux kernel [6MB] plus an Apache instance [2MB] runs to about 8MB, his machine would have had to handle something like (8MB x 100 times per second x 41,400 ghosts =) 33,120 GB/sec in throughput. Since that's about 1,380 times the maximum theoretical capacity of a fully configured next generation z900, I have trouble believing it happened.

What he appears to have done was modify the timer (the Linux kernel needs about 45,000 different lines of code to work on the mainframe - see Chart 10) to avoid these interrupts and so reduce the paging load. As he puts it in the Dancing Penguins document:

Default Linux idle task management concept is not well-suited for hypervisor environments.

Default 100 hz timer pops consume substantial resources for no benefit if system is idle.
Must be adjusted proportionately -- other important timing functions are derived from this value.

If I understand things correctly, the adjustment needed to fit the paging requirements for 41,400 ghosts into available bandwidth on a brand new 16 CPU z900, never mind a two CPU G5, means that the interrupt frequency has to be reset from the default 100 times per second to about once every 13.8 seconds. Maybe I'm missing something, but this doesn't seem practical - if each interrupt caused Apache to serve up one character, you could drive the entire 3,085 mile length of the I90 from Seattle to Boston and back in about the time it would take for all 41,400 ghosts to serve up this article.

Sendmail

In a press release on the IBM site sendmail.com claims that an IBM zSeries can support up to 2 million e-mail accounts but provides no data to back up the claim.

I have been discussing this issue with a Sendmail representative. As part of this I've received a confidential document which includes a "preliminary" compilation of results from a partial mstone test (see this report of a test on a 733Mhz Linux system for details on mstone) run on the mainframe.

The test reports we have are limited to pop3 mail users accessing the system only via traditional dialup lines to download five messages per day while sending nothing. Only 10% of users are "active."

Although the report we have only shows partial results for three tests it does include one which corresponds on three values (z900, 400,000 mailboxes, 13% CPU utilization) to IBM's report:

Tests conducted in a controlled environment with a 400,000 user load resulted in very low hardware utilization (approximately 13%). Based on these results, IBM and Sendmail project that a single IBM zSeries mainframe may be able to support more than two million user mailboxes running the POP protocol!

This test failed on Sendmail's Login & Retrieval QoS [quality of service] criteria. Neither of the other tests shown correspond to IBM's numbers, although Sendmail does mention a 250K user test which passed QoS tests but provides no detail for it. Perhaps "may" is a key word in IBM's sentence?

When asked about a later [January 29/02] press release claiming that sendmail on a two processor Proliant supports 10,000 users at 215% [sic] less than a four year old Sun 450, Jon Doyle, for sendmail, wrote "We did nothing more than check retail pricing."

Another vendor, which claimed in its press release to have participated in a similar sendmail benchmark, eventually sent this note to explain why it could not forward the actual data:

I am told that the Sendmail legal department would not authorize the release of what they consider internal information. The press release was completed prior to this decision.

Domino

With reference to the Domino benchmark, Joann Duguid, Director of Linux on IBM eServer zSeries, sent me a TCO study on the use of Bynari Insight which compares the cost of running Microsoft Exchange server on NT to the cost of getting comparable services using the Bynari Insight Server running under Linux on the mainframe.
The Bynari product looks like it might be pretty neat but the TCO study certainly isn't.
For example it provides a seemingly detailed, but unsupported, cost tabulation to show that the three year cost of running the Bynari product for 5,000 mailboxes comes to $3,193,210 using Linux on the mainframe - including 50MB of disk space per user and a Bynari license fee of $11.60 per user.
At the 5000 user level they show the cost per user over 36 months as about twice that for running Exchange Server under NT, but then make their fundamental point about mainframe scaling by working out the cost ($3,278,210) for 50,000 users on the mainframe and comparing that to the $5,447,900 they compute for using NT/Exchange.
The pricing shown includes some hefty discounts:

they present the three year total cost of owning the machine at $1,552,100 --but the model cited lists at about $2,646,000 before maintenance, software licensing, and DASD or TAPE;
they present the Bynari licenses at $11.60 each --but the lowest list offered by Bynari is $18 each in packs of 1,000; and,
they upgrade the machine from 5,000 users to 50,000 - adding 45,000 licenses and 2.2TB of ESS/SHARK DASD (=50MB x 45000 users) at a total cost of only $85,000.

How this kind of thinking plays out in the real world is nicely illustrated by a piece in E-week for March 25/02. This article describes a company's success in moving 700 [sic] users to Bynari under Linux on a mainframe at a cost of "just $26,000" --and "between 7 percent and 10 percent" of their mainframe MIPS.

The Notesbench.org site has information about a benchmark result obtained by running Domino R4 mail against an R5 server on a 10 way S/390 under OS/390. In all other cases I looked at, full pricing information was provided, including one based on a 4.5 Million dollar IBM iSeries but, for the zSeries they only report that:

"The $/User and $/NotesMark are not reported because the NotesBench certification is based on a total system cost exceeding $500,000.00."

Based on the tech-news numbers, this machine would have had a base cost of about $3,735,000 before maintenance, software licensing, or DASD. At that base cost its NotesMark score of 42,508 for 32,000 concurrent users gives it a minimum estimate of around $87.86 per NotesMark.

There are only two other reports available for this particular version of the benchmark, both for PC servers running NT. The faster of these, an IBM Netfinity 5500 M20 with two 550MHZ Xeons, scored 10,957 for 8250 concurrent users at a total cost of $10,419 or about $4.15 per NotesMark.

Dozens of systems are reported on for the marginally more complex R5Mail benchmark. Here, for example, a Sun V880 gets score of 27,435 for a cost per NotesMark of $6.42 and an IBM P680 with 24 processors at 600MHZ achieves 108,000 NotesMarks at a total cost of $2,952,402 or $19.66 per point.

GT.M Financial Transactions

A May, 2001, press press release by IBM and Sanchez Associates (donor of the GT.M opensource and maker of a GT.M -formerly mumps- based financial system) included the statement:

"Initial testing on the z900 showed strong promise, with initial accrual processing throughput of 5,841 accounts per second on a 10 million account database," said Wayne Ross, Sanchez' engineering manager of systems evaluation.

The Sanchez website offers a whitepaper on their benchmarking effort and PDFs of their reports on the Sun 6800 and IBM S80, but no further information on the Linux for zSeries effort. Their report on the Sun 6800 shows a 24CPU model hitting 7,949 on the same accrual processing task.

This is the only unambiguous performance comparison found and shows the Sun 6800 outperforming the mainframe by about 35% in absolute terms, but lacks comparative pricing information.

Consolidation Examples provided by IBM

The IBM argument for this solution does not anticipate heavy use of the Linux resource in either interactive or server mode. Instead, the focus is on replacing lightly loaded Sun (not Linux or Windows) servers. The IBM Redbook mentioned earlier contains an example showing the kind of consolidation effort the machine is aimed at:

This is the setup we inherit at the fictitious company XYZ.

Table 2-1 Setup for company XYZ

Function Server Type #of Servers Average utilization

File Server Compaq DL380 10 10%

DNS Server Sun 5S 4 15%

Firewall Sun 420R 2 15%

Web Server Sun 280R 10 15%

Note: Before we go through each of the elements of sizing, keep in mind that many of the calculations we base our sizing on are confidential and cannot be explicitly written out. There are several reasons for this, the most important being we do not want to set a "standard" for how to size. Although this may seem counterintuitive, when one considers how many variations there can be in hardware (notice that our setup is fairly small and homogeneous, which will not always be the case), software, and workload, one can see why we cannot endorse a generic formula with some constants and a few variables. Since each situation is different, each sizing will have to vary accordingly. The intent here is to illustrate the principle, and not the specific implementation. [Chapter 2. Sizing 33]

Table 2-1 Setup for company XYZ
Function	Server Type	#of Servers	Average utilization
File Server	Compaq DL380	10	10%
DNS Server	Sun 5S	4	15%
Firewall	Sun 420R	2	15%
Web Server	Sun 280R	10	15%

There are some aspects of this hypothetical consolidation target that are really quite remarkable. For example:

This list, along with other materials cited, suggests that the real target is Sun's server market, not Linux or Windows server consolidation;
"fictitious company" seems to have horrifyingly poor systems management and administration.

Even if we assume that all gear is configured with the maximum of the most recent CPUs available we have

System Maximum CPUs Servers Listed Utilization Shown Implied cycles needed

Sun 280R 2 x 750 10 15% 2,250
Sun 420R 4 x 450 2 15% 540
Sun 5S 1 x 440 4 15% 330
Compaq DL380 2 x 1000 10 10% 2000
Total 52 26 Ave: 13% 5,120

System	Maximum CPUs	Servers Listed	Utilization Shown	Implied cycles needed
Sun 280R	2 x 750	10	15%	2,250
Sun 420R	4 x 450	2	15%	540
Sun 5S	1 x 440	4	15%	330
Compaq DL380	2 x 1000	10	10%	2000
Total	52	26	Ave: 13%	5,120

a total systems requirement that's well within range for a single Dell 8450 or Sun V880.

As the IBM authors put it in something of a masterpiece of understatement:

It turns out that server consolidation is most viable when there is some inefficiency in the current operation. [Chapter 2, Page 26]

The sizing case isn't actually worked out (after all the sizing methodology is confidential) so we don't know what performance level this is intended to match, what the disk requirements were, what the cost case is, or why this configuration was chosen.

No such ambiguities exists, however, in LINUX for S/390: Scalability and Competitive Advantage. apparently by a group called Sine Nomine Associates [SNA]. (See also, a June 2000 presentation by David Boyes on the same subject.)
Here, SNA proposes that a single S/390 "with support for up to 40,000 virtual servers" and costing "less than 5 million in the first year" can use Linux ghosting to replace:

500 Sun Ultra Enterprise 2,
250 Ultra Enterprise 1000; and,
an additional 20 UE1000 servers
for 250 separate clients with I/O intensive collaboration and database applications.
I don't know what a UE1000 was (the Sparc server 1000 was a much earlier machine) although it is presented here as a quad processor but I know the UE2 well --I'm typing this on one I got in 1996. The UE2 is a workstation, and not a reasonable choice for a server job --then or now.
If the client requirement called for each customer to have a dedicated machine (but not three dedicated machines) Sun's 450 would have left the organization with 250 machines at about half the price - and the S/390 option would be not have met the client's business requirements.
The IBM solution is only possible if there is no business reason for the use of separate servers for each customer. In this situation a cluster of HP K boxes or Sun 6000s would have provided an 80% cost reduction without reducing performance or reliability.

What's being compared?

RAM Max I/O Disk CPU Cycles

Sun (500 x UE2; 270 x UE10) 750GB 60 GB/Sec 27TB 347-832GHZ¹

IBM (z900) 64GB 24GB/Sec 1.5TB 12.3GHz

Percentage 8% 40% 5.5% 3.5 - 1.5%

¹ The low end reflects the original 167MHz CPUs, the highend the late 1998 upgrade to 400MHz.

This stuff isn't just specious, it's contagious. Another bit of advertorial made available on the IBM site and headlined:

IBM eServer z900 Provides Energy Saving Alternative to Server Farms
Hurwitz and Matterhorn Cite "Secret" Competitive Advantage

claims that:

While a typical configuration of 750 Sun servers costs approximately $620/day in electricity to run, a single z900 -- running the same workload -- costs only $32/day, a power saving ratio of nearly 20-1. The savings are even more dramatic when floor space requirements of a server farm are considered. The average server farm requires some 10,000 square feet of floor space compared with only 400 square feet for a single IBM z900. At an average of 100 Watts per square foot, the savings can be significant.

I think these people are unintentionally illustrating a logical process called "reductio ad absurdum" in which you disprove something by showing that its consequences are untenable - it's not that there's anything wrong with that, but they don't carry things quite far enough to draw serious conclusions.
Had they asked me, I'd have pointed out that the Sun boxes have to be idle 99% of the time for the workload to fit on the mainframe. Allowing for trickle power and spin-up time, that means the Sun gear will be powered down around 97 percent of the time and so cumulatively use less power, and produce less heat, than the mainframe.

If there were some organizational reason for doing things like having DNS services run on four separate machines those reasons would presumably rule out having them all run on one machine, whether that's a larger Sun box or an IBM mainframe.

Absent such a reason a sensible manager working two years ago would have put in something like a Dell 6400 for the Windows file and print support and a Sun 450 for everything else. Of course, being "sensible" he, or she, would have two of each for redundancy and thus end up with a total of four servers instead of the 26 shown - and a total cost of about $260,000 exclusive of the disk space requirements left unspecified in the IBM document.

Today, of course, that same manager would choose between two Dell 8450s running Linux or two Sun V880s running Solaris to achieve the same services on two machines for a total cost of about $210,000 - or somewhere around two million less than the IBM mainframe proposed in the document.

How does it compare to the alternatives?

Performance

The RedBook cited above also contains quite a lot of commentary on the inappropriateness of benchmarking for performance, most of which looked like special pleading to me and a number of statements like:

An important element of the MCM and PU design is the massive bandwidth available to each processor. The MCM has a total of 24 GB/sec of bandwidth, resulting in an available bandwidth to each processor of 1.5 GB/sec. This is an order of magnitude, or more, greater than traditional enterprise-class UNIX servers. This is significant because the traditional UNIX approach is to try and minimize I/O operations; using the same approach on the z900 architecture will not make maximum use of its capabilities [Chapter 1. Introduction 9]

However, in general these servers have relatively limited memory bandwidth, so that the more frequently cache misses occur and data must be retrieved from main memory, the less the deep, private cache helps. In particular, when the system is heavily loaded and tasks must compete for processor time, each task's working set must be loaded into the private cache each time that task moves to a different processor. It is for this reason that most SMP UNIX servers are typically sized to run at utilization levels of approximately 40 to 50%. [Chapter 1. Introduction 11]

which contradict my understanding of "traditional enterprise-class UNIX servers" from Sun, HP, and DEC/Compaq.

With respect to the memory bandwidth and cache coherency management claims, I believe that the table below is more nearly correct:

IBM zSeries 900/Shark Disk Array Dell 8450/ 210S disk array Sun 3800/A5200 Array

Maximum SMP CPUs 16 8 12

System wide Cache Coherency maximum 2 x 16MB 1MB 96MB

Per CPU external Cache 16MB shared 8 ways 2MB 8MB

CPU to Cache bandwidth 1.5GB/Sec 3.2GB/Sec 9.6GB/Sec

Cache to RAM bandwidth
Per CPU 1.5GB/Sec 1.0GB/Sec 2.4GB/Sec

Maximum single controller disk I/O rate 32.0MB/Sec 160MB/Sec 160MB/Sec

Maximum Disk I/O channels
maximum combined I/O rate 144
3.2GB/Sec 4
640MB/Sec 12
1.92GB/Sec

Maximum CPU cycles/sec (16 x 770)
12.3GHZ (8 x 900)
7.2GHZ (12 x 900)
10.8GHZ

System cost;
Includes, OS, 3 Years $5,200,000; 64GB, 1.6TB Disk; [Estimated] $115,000; 32GB, 0.5TB Disk $306,000; 64GB, 1.6TB disk

Notes:

The information presented in this paper is as close to right as I can make it. Be aware, however, that I'm not a hardware engineer and will be grateful if you spot and report any mistakes I've made - but please cite a reputable source for your information.
I would have liked to have PA-RISC and Alpha columns in this table but both DEC and HP took very different design routes than IBM, Sun, or Dell. I have experience with the PA-RISC 8600 but not the newer 8700s or the more recent Alphas and so don't feel up to this. If you have the information, please contact me and I'll get it included in the next version of the paper.

	IBM zSeries 900/Shark Disk Array	Dell 8450/ 210S disk array	Sun 3800/A5200 Array
Maximum SMP CPUs	16	8	12
System wide Cache Coherency maximum	2 x 16MB	1MB	96MB
Per CPU external Cache	16MB shared 8 ways	2MB	8MB
CPU to Cache bandwidth	1.5GB/Sec	3.2GB/Sec	9.6GB/Sec
Cache to RAM bandwidth Per CPU	1.5GB/Sec	1.0GB/Sec	2.4GB/Sec
Maximum single controller disk I/O rate	32.0MB/Sec	160MB/Sec	160MB/Sec
Maximum Disk I/O channels maximum combined I/O rate	144 3.2GB/Sec	4 640MB/Sec	12 1.92GB/Sec
Maximum CPU cycles/sec	(16 x 770) 12.3GHZ	(8 x 900) 7.2GHZ	(12 x 900) 10.8GHZ
System cost; Includes, OS, 3 Years	$5,200,000; 64GB, 1.6TB Disk; [Estimated]	$115,000; 32GB, 0.5TB Disk	$306,000; 64GB, 1.6TB disk
Notes: The information presented in this paper is as close to right as I can make it. Be aware, however, that I'm not a hardware engineer and will be grateful if you spot and report any mistakes I've made - but please cite a reputable source for your information. I would have liked to have PA-RISC and Alpha columns in this table but both DEC and HP took very different design routes than IBM, Sun, or Dell. I have experience with the PA-RISC 8600 but not the newer 8700s or the more recent Alphas and so don't feel up to this. If you have the information, please contact me and I'll get it included in the next version of the paper.

To me, the comments on cache utilization seem to reflect a very fundamental design difference between batch oriented processing and interactive work. In the traditional IBM world a process is created first, resources are assigned to it, and then it enters the run queue. Once it is running, the executable switches between data and instruction sources as new transactions and logic arrive, but the main process control loop stays largely "CPU resident" and external resource allocations do not change during the run.

Batch processing on Unix?

You can emulate batch processing on Unix but you can't wholly remove Unix from the system while your batch runs.
There are Unix job schedulers that resemble their mainframe cousins and products like Unikix or transactions processing environments [e.g. Sun MTP/DBM] which simplify both the porting and management of the applications that go with this.

Batch processing on Unix?
You can emulate batch processing on Unix but you can't wholly remove Unix from the system while your batch runs. There are Unix job schedulers that resemble their mainframe cousins and products like Unikix or transactions processing environments [e.g. Sun MTP/DBM] which simplify both the porting and management of the applications that go with this.

In the Unix world, however, processes are not created; they spring magically into existence and start to run when their contexts are loaded. As a result most Unix CPUs have hardware context management allowing them to completely switch processes within one instruction cycle - or even to run more than one instruction stream concurrently. In effect that creates a large process cache independent of on board data, address, or instruction caches.

The third claim made, that "traditional enterprise-class UNIX servers" are usually sized to run at 40-50 percent utilization to compensate for memory bandwidth limits strikes me as another example of a cultural difference in perception resulting in a claim that looks perfectly sensible to a mainframer but utterly nonsensical to a Unix user.

In the IBM mainframe world workloads are very tightly scheduled into a 24 x 7 processing envelope. This works because considerable resources are devoted to predicting and managing run-times, thereby allowing systems managers to precisely balance the hardware they license against the workload they expect. Combined with the extremely high cost of a fundamentally scarce resource, this ability to predict and easily measure system utilization has led to capacity planning and utilization management becoming widely recognized professional specialties within the mainframe community.

Neither these specialties nor the need for them exist in Unix. The Unix world is fundamentally interactive - meaning that you cannot predict when someone will start a job, what that job will be, or precisely what resources it will take. About the only thing you can predict with certainty is that individual users will think their jobs take too long to run and that groups of users will launch vast conspiracies against you by starting their longest and most incompetently programmed ad hoc queries at the same time -i.e. just before leaving for Lunch, coffee breaks, or staff meetings. As a result the experienced Unix manager always wants all the instantaneous processing resources he can possibly afford - and typically couldn't care less about such touchstones of mainframe management as average system utilization levels.

On the numbers, the mainframe should not be remotely competitive with Unix. The hardware specifications don't match up to those from Sun's midrange (and, by extension to those describing the Alpha and PA-RISC); the upfront cost appears to be much higher, and IBM has stopped benchmarking the S/390 and its successors against Unix on things like SAP, or TPC, transactions processing.

Nevertheless, when you talk to mainframers they're usually absolutely confident that their "big iron" outperforms everything else - and able to point to roughly 14,000 mainframe data centers in which these machines continue to do some serious "heavy lifting."

I believe that this situation exists for three main reasons:

load averaging;
optimization; and,
selective vision.

batch processing, including on-line batch, produces an averaging effect; spreading workloads generated during the working day across nights and weekends.

Careful load planning combines with precise capacity management to produce very high system utilization rates because grouping processing requirements into batches averages resource demand over time.

In the Unix world the primary load consists of handling user interaction and thus occurs mainly while users are at work - typically during less than 25% of the 24 x 7 week. Both systems have to devote off peak resources to things like backup, but a well run mainframe center can use batch control and capacity management to achieve 96% or higher average utilization for 168 hours a week while a well run Unix system will usually average less than 50% utilization during peak hours and 10% during off hours.

Linux kills the assembler advantage

One reason the mainframe gets far more work done per CPU than you might expect based on experience with Unix is that much of the code used on the mainframe is very highly optimized.
This efficiency is a consequence of the tens of billions of dollars spent on mainframe coding, performance optimization, and tool development --but it all goes away when you try to run Linux and Linux applications on that mainframe because these are not written in assembler, are not highly optimized to fit the technical environment, and don't have forty years of hardware specific performance tuning behind them.

Linux kills the assembler advantage
One reason the mainframe gets far more work done per CPU than you might expect based on experience with Unix is that much of the code used on the mainframe is very highly optimized. This efficiency is a consequence of the tens of billions of dollars spent on mainframe coding, performance optimization, and tool development --but it all goes away when you try to run Linux and Linux applications on that mainframe because these are not written in assembler, are not highly optimized to fit the technical environment, and don't have forty years of hardware specific performance tuning behind them.

both OS and application code is very highly optimized to minimize system resource use.

Linux, like Solaris, has very little assembler; but essentially all of CP/VM's core functionality is done at the assembler or machine code levels. Similarly, virtually all Unix/Linux applications are written in C or higher level languages accessing standard libraries also usually written in C.

In contrast, most mainframe control environments, including loadable libraries and related systems level applications, are written and maintained very close to the hardware - usually in PL/x or assembler but often with handwritten or at least "tweaked" object code- to use far fewer cycles than their C language Unix equivalents.

Similarly, most major applications have very long development, testing, optimization, and continuous debugging/improvement processes behind them. Things a Unix programmer would hack as a few PERL scripts pipelined together with some system utilities take months of planning and development before being released to run on a mainframe.

The key to performance is usually found in data and algorythm design, not the choice of language. Optimizing compilers like GNU C or IBM's PL/x encode hundreds of man-years of experience in converting basic language structures into highly efficient machine code and so generally do this better than ordinary assembler programmers inventing the code for themselves.
Most data centers, however work with assembler primarily where analysis shows that compiled code bottlenecks. In those situations order of magnitude improvements are common because even optimizing compilers have to generalize where hand tweaking by people intimately familiar with the specific system can fit code to exactly the hardware and system software installed.
There are few analogues to this in the Unix/Windows worlds because we don't typically code for a specific system installation. The closest example I can think of, Sun's medialib attempt to get more people using the VIS/SIMD instructions on SPARC is still generic to an architecture --not specific to an installed system. For people willing to make the effort, use of VIS/SIMD can produce average speed-ups in the 6-10 times range for "new media" type processing and four times for some arithmetic processing.
To put this into a mainframe context, imagine a production Solaris system that bottlenecks on something like the checksum computation needed for packet assembly, so its managers recode just that function using the SIMD short array capability to get about a 5:1 speedup. Because bottlenecks cause other problems, like excessive paging, their removal typically provides a disproportionate overall gain in throughput. Finding and fixing bottlenecks like this is how mainframe code optimization often works.

The Unix approach minimizes programmer time while maximizing flexibility but demands very powerful computing resources; the mainframe approach substitutes careful planning and detailed optimization for computing power to do same job with far fewer systems resources.

How much advantage this confers depends on many factors including the degree of abstraction embedded in the language; the quality of the compiler/interpreter; the application; and, the quality of the coding, but may reach 100 to one for key repetitive passages and average between 2 and 5 to one for the bulk of most applications.

If you assume an average factor of 4, this means that the 16 CPU mainframe running typical mainframe applications should deliver roughly four times the throughput per CPU when running mainframe control programs and applications than it will running Linux and Linux applications.

"heavy lifting" can be defined to include only those jobs the mainframe is disproportionally good at.

To really understand what's going on we need an organizational design [OD] idea, called "mutually contingent evolution." This describes how systems management methods and the perceptions of what constitutes appropriate workload co-evolved with the hardware and software provided by IBM. Thus advances in methods, processes, and perceptions influenced hardware and software refinement while hardware and software change influenced changes in management and perception.

The perception part of this is important. The old joke about everything looking like a nail to people who only have hammers isn't quite right; the reality is that people who only have hammers tend to see only those things they can hit with that hammer, and then classify what they see as nails. That's what happens here: if it doesn't fit the mainframe mold, it's invisible and not data processing.

Another OD idea, "resource hurdling" also applies to understanding how this works. This describes the co-evolution of the cost of a resource and the organizational hurdles put in the way of using that resource. Make a resource scarcer or more expensive and organizational barriers to its use will grow rapidly; make it cheaper or more plentiful and those barriers will tend to shrink slowly.

Taken together this means that the workload, the technology, and the management ideas around them all co-evolve together producing, in this case, the result that mainframe costs justify the cost of the access hurdles and controls put around it, and the combined cost justifies the organizational hurdles put in the way of changing those controls.

A Unix system like a PC running Linux or BSD is a general purpose machine capable of handling a wide variety of jobs ranging from highly interactive to batch. Mainframes, on the other hand, are specialized machines purpose built for exactly one kind of job: processing large numbers of relatively simple transactions. Even interactive environments like TSO load as batch jobs that loop to read and process content sent from block mode terminals and therefore only emulate interactive processing of the kind native to the Unix kernel.

From a user management or organizational perspective these amount to coping mechanisms that both individually and collectively raise systems cost while compromising systems performance. Batch processing is resource efficient from a systems perspective - in fact, it started as a way to make maximum use of very small processing capacities. From a corporate perspective, however, this kind of resource efficiency is important only so long as the cost of the resource is high relative to the cost consequences of the workarounds needed to minimize use of that resource.

Given the enormous cost of mainframe computing, organizational costs incurred including:

processing delays;
reduced flexibility; and,
the shift of process control out of user hands.

are easily justified. Step outside that environment, however, and the low cost of Unix processing power reverses the balance making it more important to get organizational benefits like:

reduced response times;
shifting control into user hands; and,
enhancing the organization's ability to adapt to change

then to save a few dollars in capital cost by trying to spread processing loads over the full 24 x 7 week.

Fundamentally that's what's wrong with running Linux on the mainframe: all of the machine's design advantages are shunted aside by the interactive nature of the workload, the inefficiency of the software in hardware terms, and the unpredictability of usage demand --leaving only its high costs in place.

A salute to the Show Me! state

None of the benchmark results are definitive with respect to the cost performance tradeoff and this product's positioning relative to other Unix offerings including Linux and BSD on the PC, Solaris on SPARC, Tru64 on Alpha, HP-UX on PA-RISC, and even AIX on the Power4. On the numbers we have, Linux on the mainframe looks like a loser but we don't actually know because we don't have access to real test data.
As you'll see in next week's second article on this topic, I think that Linux on the mainframe has an important role to play and ought to be considered for use in many data centers - i.e. that there is an IBM value proposition to be considered. At the same time, however, I don't think mainframe Linux is remotely competitive with Linux on x86 or Solaris on SPARC on either cost or performance.
The way to find out whether this is right or wrong is to run actual tests. Third party, audited, tests with verifiable results and full information on the real costs of using the products. In the third article in this series I'm going to propose a framework for this - and hope IBM responds in a positive way.

A salute to the Show Me! state
None of the benchmark results are definitive with respect to the cost performance tradeoff and this product's positioning relative to other Unix offerings including Linux and BSD on the PC, Solaris on SPARC, Tru64 on Alpha, HP-UX on PA-RISC, and even AIX on the Power4. On the numbers we have, Linux on the mainframe looks like a loser but we don't actually know because we don't have access to real test data. As you'll see in next week's second article on this topic, I think that Linux on the mainframe has an important role to play and ought to be considered for use in many data centers - i.e. that there is an IBM value proposition to be considered. At the same time, however, I don't think mainframe Linux is remotely competitive with Linux on x86 or Solaris on SPARC on either cost or performance. The way to find out whether this is right or wrong is to run actual tests. Third party, audited, tests with verifiable results and full information on the real costs of using the products. In the third article in this series I'm going to propose a framework for this - and hope IBM responds in a positive way.

Other claimed benefits

As noted earlier IBM claims five main benefits for mainframe Linux:

The high reliability associated with the IBM hardware.
The isolation of each Linux instance from all others during execution.
Easy access to opensource and internet related applications.
Power savings from running one machine instead of many.
The speed, ease, and low cost of internal networking between Linux instances.

All of these are, I believe, highly questionable.

Reliability

The mainframe's reliability reflects its use in a system that includes an extremely well defined set of management methods. The hardware is reliable, but no more so than that from other manufacturers making comparable quality gear; it is the management methods which go with the hardware that make the combination extremely reliable. Take away those management methods, and the claimed reliability benefit is unlikely to materialize.

Isolation

The isolation benefit is theoretical rather than practical for any significant number of Linux instances because:

the isolation exists only at the secondary software level. Each Linux instance can be independent, but all of them depend on multiple single points of potential failure; for example, at the hardware and VM levels;
performance management requires resource sharing. Most systems managers will quickly recognize that you can run N linux ghosts on one machine by duplicating static resources like /, /usr, and /opt N times or by providing each copy access to a common resource. Given system wide paging limits and access to no more than 64GB in total memory, sharing the resource is extremely attractive because it reduces paging and dramatically increases overall performance by increasing the likelihood of cache hits on shared executables and data.
As a result a zSeries with N Linux ghosts running is very unlikely to allow full SMP, have N separate copies of the boot image, /usr, and /opt, or provide default access to 2GB of memory but, of course, this type of sharing creates additional shared points of failure whose existence and importance then cast doubt on the inter-ghost isolation offered as one of the primary benefits of running linux under VM.

Access to open source software

Quite aside from the obvious fact that Free Linux for the mainframe costs in excess of $11,000 per CPU plus several thousand per month for support, there's the problem of bridging the gap between the fundamentally interactive nature of Unix and the batch oriented mainframe architecture.

For example:

SMP capabilities in the 2.4 and later kernels have no obvious applicability in the VM context and emulations like micro-partitioning waste system resources.
Linux memory management assumes control of a machine and so grabs up free memory for use in I/O buffering. Having multiple Linux instances do this to independently buffer I/O to the same files resident on a shared mini-disk not only wastes memory but dramatically increases the paging effort.
Linux OS paging/swapping is done to a swap device - but VM already pages independently. To avoid paging chains -in which Linux pages out pages which VM first has to page in- the smart thing do is to set up swap to a ramdisk. That, in run, imposes severe limits on size for each ghost because the mainframe is limited in the amount of memory it can use.
The Linux scheduler depends on a timer that typically checks for new work about 100 times per second. If this hits while VM has the Linux guest OS paged out, it has to load the entire guest OS just to run the check process and then page it out again.

The most effective route to resource sharing is to move the applicable Linux functions out of the Linux guest OS and handle them in VM instead. Consider, for example, the effect of the 64GM memory limit and the use of a 1GB ramdisk as a way of limiting Linux paging chains. If you have Linux create the ramdisk it become subject to VM paging and those overheads will prevent you from setting up more than perhaps 30-40 guest instances. If, on the other hand, you use VM to partition memory and create the ramdisk to be used for swapping, you pretty much have to make it a shared resource because otherwise you'll run out of system memory at no more than perhaps 30 to 40 Linux instances.

In the longer term these kinds of conflicts between the basic interactive design at the heart of Linux and the design of the mainframe will need to be resolved. Some of those differences are hardware specific, including the big endian vs little endian issues, assembler issues, the absence of sigcontext capabilities etc, but the more important ones are fundamental to the operational concepts behind system deployment. Linux embeds very basic assumptions about usage patterns and access to hardware that have to be worked around on the mainframe. For example, IBM recommends the use of telnet instead of a native GUI like KDE or GNOME while issues like memory management and timer control require significant kernel patches to work efficiently on the mainframe - but become unLinux-like in the process.

As a result mainframe Linux has different operational values than does desktop x86 based Linux and those differences are ultimately reflected first in usage patterns and then in code adapted to those usage patterns.

IBM has done a a lot of work on this already and provided a starting point for people considering porting Linux applications to mainframe Linux. These change requirements are extensive, so much so, in fact, that ordinary users downloading code cannot reasonably expect to be able to make those changes on the fly or through the use of simple tools like GNU's configure utilities. Over time, therefore, I expect that we'll see applications that run on Linux, but not on mainframe Linux; and applications that run on mainframe Linux but not on x86 Linux --thus voiding this claimed benefit.

Power

There's little reason to question this. Running one IBM mainframe uses less power than running 750 Sun or PC servers. No ifs, buts, or maybes; this would be a real benefit, however trivial next to the cost of the Linux and VM licenses, if the mainframe could handle the same load - something I don't believe.

Networking

The networking benefit demonstrates the kind of logical fallacy known as "affirming the consequent" in which you prove eggs by assuming chickens and then prove the chickens by pointing at the eggs.

The argument is that if you replace several hundred Linux PCs with one zSeries, all of the networking gear and resources previously needed to allow the Linux PCs to communicate with each other become virtual connections within the VM environment. Since these new connections are faster and have essentially no maintenance costs, the elimination of the previous networking costs amounts to a zSeries benefit.

The conclusion is obviously correct, except that it assumes both an unlikely problem - a need to replace hundreds of Linux machines with Linux ghosts - and its solution - replace hundreds of Linux machines with Linux ghosts.

Consider that there are two usage scenarios where the requirement might exist:

you have hundreds of Linux PC desktops to replace; or,
you have hundreds of Linux servers to replace;

but:

you don't care about the multiple single points of failure inherent in running these on one machine; and,
you have some compelling reason not to run your Linux emulations using Linux products like Virtuozzo on a native linux machine like a Dell 6450;

then:

if you replace the desktop Linux PCs with something like an IBM Netvista smart display connected to a matching Linux ghost on the mainframe, the network savings come from the smart display, not the mainframe; but,
if you restrict - as IBM generally does- the use of those Linux ghosts to server functions, then you could go one step better and get rid of both real and virtual networks just by loading all of the server tasks on a single Unix SMP machine.

Summary: what we know

On a "raw iron" basis the machine beats high end PC servers but doesn't stack up against mid range Sun gear (and, by extension, against competing PA-RISC and Alpha products).

On a workload applicability basis the gear fits Linux about as well as snowshoes go on a downhill ski racer.

For an easily severable workload, like Domino, the same benchmark results that show a 10-way mainframe getting a NotesMark score of 42,508 suggest that a cluster of five Dell 2450 PC servers for a total of around $61,000 would blow it away on both absolute and relative peformance.

We don't have a comparative performance benchmark for a mixed workload of relatively small but unpredictable tasks of the kind Linux on x86 excels at. What we do have, however, is comparative cost information on the basic systems. At list price, you could rack up eighty (80) Dell 8450 servers each with:

four 900MHZ Xeon processors each with 2MB cache
16GB of RAM
4 x 73GB US3 Disk
Red Hat Linux pre-loaded

for about the same money as one fully configured z900.

At the moment Linux doesn't scale well past four processors and about 4GB of RAM but other Unix variants do. If your workload needs massive SMP capabilities with flat memory spaces above 16GB the place to start is with the Sun 3800 at about $350,000 fully configured, but the only direct comparison for which we have performance indicators is with the larger 6800:

IBM ZSeries (2064-116) Sun 6800 Sun 6800 as Percent of IBM zSeries

Maximum CPUs (Total MHZ) 16 (12,320) 24 (21,600) 175%

Maximum System Throughput 24 GB/Sec 67.2 GB/Sec 280%

Maximum System Memory 64GB 192GB 300%

Estimated cost with 1.5TB of disk, 16 CPUs, 64GB RAM $5,251,000 $960,000 18.3%

Note that the 18.3% price comparison is a best case for the mainframe and assumes a workload justifying the flat memory space and high reliability of the 6800. If the workload is easily severable, like Domino or a Windows file and print service, you could expect to achieve about the same throughput with a cluster of eight four-way PC servers like the Dell 8450 at less than 10% of the mainframe's cost.

	IBM ZSeries (2064-116)	Sun 6800	Sun 6800 as Percent of IBM zSeries
Maximum CPUs (Total MHZ)	16 (12,320)	24 (21,600)	175%
Maximum System Throughput	24 GB/Sec	67.2 GB/Sec	280%
Maximum System Memory	64GB	192GB	300%
Estimated cost with 1.5TB of disk, 16 CPUs, 64GB RAM	$5,251,000	$960,000	18.3%
Note that the 18.3% price comparison is a best case for the mainframe and assumes a workload justifying the flat memory space and high reliability of the 6800. If the workload is easily severable, like Domino or a Windows file and print service, you could expect to achieve about the same throughput with a cluster of eight four-way PC servers like the Dell 8450 at less than 10% of the mainframe's cost.

On the Sanchez financial transactions benchmark, the kind of thing that constitutes home field advantage for the mainframe, the 6800 beat the mainframe by about 35% - at about one fifth the cost.

The Next Question

To put it nicely, Linux on zOS looks like a loser, so why is IBM, a serious company with big dollars at stake, telling us that Linux on the zSeries is a good idea? That's the topic of next week's article.