Questioning IBM - discussions

Questions and Answers

Good news about UTS

The article contains a reference to Amdahl's mid eighties UTS product and asked if anyone knew what happened to it since. Seems it's alive and well:

Dear Mr. Murphy,

UTS is very much alive and kicking. Check out our web page at:

http://www.utsglobal.com

We are not a part of Amdahl and haven't been for about 2 years.

UTS is still very much in use, check out our customer list. You'll find
some very large customers. Our market is chiefly oriented around the
telecommunications industry.

UTS Global is also active in the Linux/390 community. We've contributed
3270 support to Linux/390 2.4 and have released CLAW and tape drivers
under the UTS Global Public License.

Dennis Andrews
Chief Software Architect
UTS Global LLC

but I'm ignorant, incompetent, untrustworthy, and dishonest

There are three things in this paper that triggered a lot of response:

  1. my comments about the history of VM;
  2. my comments about the use of assembler; and,
  3. my comments about paging in VM when trying to create 41,400 instances

Two VM experts who reviewed the paper prior to publication warned me about all three. In retrospect I wish I'd been smart enough to listen to them.

Comments about paging

I was dead wrong on paging. I had forgotten about VM's use of re-entrant code in which all of the guests running share the same code image.

This raises other issues and reinforces my belief that all 41,400 images were essentially identical and shared everything while doing nothing but the ascription of paging needs was certainly stupid on my part.

Not only was this wrong, but the mistake buried the main point: that that the issue isn't running 41,400 copies of something, but getting those images to do "real work." If they don't have to do anything, independence isn't an issue, and there's no need for an external network connection, then this is easy. The problem is that third parties, not IBM or David Boyes, are interpreting this experiment as proof positive that you can take 41,400 real x86 Linux servers, virtualize them on a single two-way mainframe with 128MB of ram, and get the same level of service you started with.

This is nonsense; it's just as if I reported that I can start 10,000 "sleep 7200" processes on a Model 80 (true) and other people then ran around claiming that this proves its ability to serve 10,000 concurrent users.

Comments about Sendmail

We've had considerable discussions with the Sendmail folks. As part of this they've done a full mstone report on tests on using the z900. Originally they said we could publish data from this and would provide a pointer to where you could download the document yourself to check it and then sent us an email saying that all of the information is confidential.

Pending a full and careful review of the document and their permission to release data from it, I can say that it looks okay but you need to be aware that the claim that a z900 can support two or three million email users, if in fact supported by the data, applies only to very low volume POP3 users on dial-up modem lines. The more realistic IMAP numbers drop by three orders of magnitude --from millions to thousands.

it's important when evaluating this that you bear in mind the various key assumptions about usage patterns involved. In these tests only 10% of pop3 users are considered active in a half hour interval. Thus a claimed "400,000 users" really means 400,000 accounts and implies only 40,000 concurrent users.

I've found many references to a to a Shiloh/Haynes test done in 1998 that supposedly showed 100,000 concurrent POP3 users along with 20,000 IMAP users (i.e. 120,000 "real" users on at the same time) running on a Sun 6000 but not the document itself. I've also found a fuller report from 1999 of a test by the same group using Oracle's messaging server on a Sun 6500 with 22 CPUs running at 336MHZ. In that test the 6500 supported 360,000 concurrent users of which 350,000 were pop3 and 10,000 were IMAP. Using the logic offered in the Sendmail study these 350,000 POP3 users indicate the ability to support more than 3.5 million pop3 accounts on a 22 x 336MHZ box or, by linear extension using the (probably incorrect) assumption that all processing is CPU bound - about 10.2 million accounts on a Sun 6800.

That number is unrealistic but directly parallel in derivation to the numbers in the Sendmail claim for the z900.

For a more realistic comparison to running Sendmail under Linux on a z900 check the mstone report . This reports on similar tests run against a Dell uniprocessor P3 at 733MHZ with Red hat 6.2 and suggests that a more current eight way Xeon running at 900MHZ would probably prove faster on this test than the z900.

If, or when, Sendmail releases the data, I'll put a followup analysis here or on my winface.com home site.

C vs Assembler

Many people insist that most of zOS is now C, that assembler no longer has a key role, and/or that PL/X isn't "close to the hardware." This entire argument misses the point.

I'm not saying assembler is always faster than C, that PL/x is better than C, or that applications are generally coded in either assembler or PL/x. I'm saying that using assembler to debottleneck applications is:

  1. common practice for mainframers but not for Unix users;
  2. very effective,
  3. part of a 40 year tradition of debugging that has led to the achievement of high levels of optimization on system libraries and other OS related facilities; and,
  4. a contributing cause of the mainframe's ability to run at high throughput rates whose benefits go away when you run Linux on that mainframe.

History

On the history of VM the overwhelming majority of comments are simple ad hominem attacks.

In my opinion, however, these people are responding more to my failure to subscribe to core community beliefs than in defense of the validity of those beliefs. I'm sorry to offend, but just insisting on allegiance to commonly accepted Truth only illustrates the role of that acceptance as a touchstone for membership in the community of believers. It does not amount to actual evidence for the validity of the belief.

If you are interested in a humorous (well, tragicomic) version of the early history take a look at: this one. It's bitterly funny, and close enough to reality that its not obvious whether the authors intended those of us who had to live with this stuff to laugh or cry.

and I'm shilling for Sun too

I do not work for Sun and I do not sell Sun products. I do, however, use Sun gear personally and generally recommend it to consulting clients because it meets my number one criterion for computer systems: it works essentially all the time.

Note to Scott McNealy
On the other I could be bribed. Do you need someone to review that 20th anniversary workstation? My wife has a twentieth Anniversary Mac I'd be happy to compare it to -- but, remember, anything less than 2.1GHZ, 8GB, the 24" LCD, and 4 x 73GB disks and that Mac will prove prettier, faster, and just generally better!

In the article I try to compare to Linux on x86 whenever possible but the combination of IBM's apparent focus on replacing Sun, not Linux, servers and the relative scale of the mainframe forced a lot of Sun comparisons too. So far no one has come forward with comparable PA-RISC and/or Alpha numbers, but I think that when I do get them, they won't be much different from Sun's numbers.

But it works! and saves heaps of real money!

Yes, Linux on the mainframe works pretty well. Technically it's very cool stuff - but I don't believe it's as widely applicable as people claim and I haven't seen any legitimate benchmark results that suggest it to be remotely cost or performance competitive with Unix -whether that's Linux on x86, Solaris on SPARC, HP-UX on PA-RISC, Tru-64 on Alpha, AIX on Power4, or almost any other combination.

The main theme those who wrote to me about cost savings hammered at was that people who already have a mainframe installed can use that virtually cost free to run Linux applications. Unfortunately this logic exemplifies something known as the sunk cost fallacy.

The sunk cost fallacy is a big thing in behavioral economics and finance. Behaviorally, it's usually an attempt to recoup already spent dollars by spending more dollars; technically, it's a failure to limit decision making considerations to the costs and benefits of the current decision by including consideration of monies already committed or benefit already earned. Check for "sunk cost fallacy" using a search engine like google and you'll find lots of examples.

A case in point is the E-week discussion of mainframe Linux cited in the first article. The report does not give enough information to make a final judgment about what happened but clearly points to people who think they "saved heaps" of money using Linux on the mainframe to run 700 Email accounts - at a cost of $26,000 plus at least 7% of the mainframe. I think a real investigation would show that these people tripped over the sunk cost fallacy to create what amounts to an IT management version of "America's Funniest Videos". The kind of thing in which people get hurt on camera and we all laugh because it isn't us --check slashdot for the laughter and then take a good look at that story while asking yourself about the likely total costs to the business owners.

Batch vs Interactive

A number of people see my characterization of the mainframe hardware environment as batch oriented instead of interactive as terribly wrong. Well, so did IBM's Joann Duguid and this is clearly an issue of serious importance to the VM community. This is another case where I wish I'd said it differently but maintain the correctness of the basic argument: this hardware is highly optimized for batch throughout, not delivery of interactive user services.

And, in response to the gentlemen whose comment:

He obviously has never used CMS on VM, it's as interactive and responsive as any Linux system I've used

was forwarded to me by a third party I do have a question: are you saying that you used a GUI like CDE, GNOME, or KDE with CMS or that you've never used Linux?

Deep Throats and other future friends

I've received some interesting, but not yet confirmed, information on costs and performance claims made by IBM sales people. In brief, total costs are said to be higher than those in the article and claims include things like replacing 7,000 two way Linux servers with a four way mainframe. We've asked IBM to confirm the authenticity of one of these documents and will report further.

Various people have contributed the results of running simple benchmarks like Bonnie++ on mainframe Linux.

Here are some sample results:

                              ------Sequential Output------ --Sequential Input-      --Random-
                              -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--   --Seeks--
 Machine                 Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP   /sec   %CP
 IBM G5                  184M  2153  99 14790  14  3562   7  2311  98 98460  97    859.4  10
 IBM (G5/Shark)          256M  1334  99  9001  43  4506  10  1347  99  9945  10    433.8   6

 1.4GHZ Athlon             4G 16842  92 21793  14  7058   4 16540  81 27016   9     78.2   0
 Sun UE2 (15K-RPM; /tmpfs) 1G  6198  99 53081  99 43184  97  5887  99 115902  99  12679   177
 Sun UE2 (4.2GB/7.2K; hfs) 1G  4341  74 11605  35  6318  29  5387  97 19638   37   355.7    10

Two notes:

  1. I have received eight bonnie++ reports for the mainframe. These are the two fastest reported.
  2. The results shown for the first UE2 report were run with paralell I/O to a pair of 15,000 RPM drives mounted as swap - that's why the random paging number is off scale.
  3. The second UE2 reported has the original two 167MHZ processors (from 1997) and used the standard file system on the 4.2GB, 7200 RPM drives it came with. This machine is therefore directly comparable to the slower 500 of the 770 Sun machines Sine Nominee claimed could be replaced by a single mainframe.