Linux vs. Unix

More Reader Reaction

About a week ago I received a detailed analysis of floating point performance on what the writer claimed was a dedicated z900 engine running SuSe Linux. The numbers were surprising enough that I subscribed to a LINUX-VM VM-390 list hosted by Marist College specifically so I could ask its members to run a few tests for me to confirm or disconfirm the numbers.

What I got back was mostly personal abuse intended to discredit the questioner rather than to confirm or refute the findings. Stuff like this:

From: "Phil Payne"
To: "rudy"
Subject: Re: Request for help on floaing point timing
Date: Sat, 18 May 2002 13:20:02 +0200
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700

> I've been writing a series of articles (as Paul Murphy) in
> linuxworld.com on IBM's reticence with respect to the cost and
> performance characteristics of their mainframe Linux offerings. As a
> result I've received some interesting mail including some results I
> want to cite in public but can't either because the sender didn't want
> that or because they are uncorroborated.

Synthetic loops were discredited over three decades ago. The last IBM mainframe that performed predictably on them was the 1964 IBM System/360 Model 50.

You might as well measure the output of the loudspeakers on an ocean liner when comparing it with a Jumbo jet. No information of any value whatever will result.

I'm not sure what worries me the most - the fact that someone with such appalling knowledge of performance measurement is writing articles at all, or the fact that someone is publishing them.

Phil Payne
http://www.isham-research.com
(Co-founder and past vice president, UK Computer Measurement Group)
(Quoted with permission)

qualifies as mild and restrained by comparison to some of it.

The example behind the question was the time it takes to run a typical work unit in the SETI@home series on mainframe Linux (about 43.7 hours) compared to the 3.5 hours typical of Linux on a P4.

It's true that a single run of SETI@home doesn't provide a reliable guide to a machine's floating point performance, but run the thing 409 times and the average should tell you something meaningful about its performance relative to other gear running the same code against similar data. That's not, however, how many of the people who routinely contribute to this listserver see it. To them, not only is this benchmark unfair and pointless, but all others are too.

The mainframe gets PII class results on Bonnie and Bonnie++? it's an unreliable benchmark.
The mainframe gets PII class results on IEEE floating point? floating point isn't important.
Poor GNOME or KDE performance?Only server functions count and Linux memory management is badly designed anyway.
It takes 0.77 CPU seconds to process one Email? that's an atypical task which does not fairly show the mainframe's strengths.

In reality these are the kinds of tasks Linux excels at. They don't fit the mainframer's idea of data processing but Linux performance on the mainframe must be evaluated in the context set by other uses of Linux, not other uses of zOS. What they're doing --excusing the poor performance of Linux in their environment by denigrating the kind of interactive computing it is best at-- is a lot like insisting that a one legged man can win all marathons without entering simply because the speed and endurance demonstrated by those who would come in ahead of him if he ran, are inappropriate measures on which to pick winners.

As the first article pointed out, a Sun 6800 midrange has about 30% more processing capacity than the highend mainframe at less than one fifth the cost. But this, like benchmark results, is also dismissed as immaterial. Here's the canonical response as ennunicated by David Boyes:

MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook CWS, Build 9.0.2416 (9.0.2911.0)
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000
Approved-By: David Boyes
Message-ID:
Date: Mon, 22 Apr 2002 12:49:44 -0400
Reply-To: Linux on 390 Port
Sender: Linux on 390 Port
From: David Boyes
Subject: Re: LinuxWorld Article series
In-Reply-To:

> But - he's comparing one mid-range sun to one z/900. Seems like
> the 37% people and remainder facilities would be the same in both
> of those. One sun should be just about as much work/power as one
> z/900.. in fact, I'd expect one mid-range sun to be a little lower
> on the power/HVAC requirements.
This is probably more garbled than I want it to be, but I'm short of time, and running out of battery in the laptop. For *one* application, one box, he's probably ok. It's taking the larger view of the fact that most organizations don't have only one application, nor do they have one box per application. Let's think about box count first for a moment. Consider that for most organizations, when you deploy Application X's server in production you need some extra hardware to make the solution supportable. You need:
1) the production box itself
2) a backup server or hot spare in clustering environment (we are talking mission crit apps)
3) a development box
4) a test/QA box
5) possibly a regression box in case more than one version is in production at any given time.
So, the comparison of one z900/z800 is actually against 4, possibly 5 Sun boxes per application deployed. You can't double up on the test systems because you need them to mirror production to be a valid test; and you sure don't want developers testing on the same box. So, assuming worst case of 5 boxes per application, we've erased most of that 18% number down to about 2-3% overall. What happens when the next application comes along, call it Application Y? You now need new boxes for that application. Can't use the others because they're dedicated to application X. So, for Application Y, you now need 1+4 *more* servers. We're now up to 8 servers for two applications. The trend is clear.
(From the Marist College VM-Linux List)

COST COMPARISON
[From what seems to be an IBM sales document]

LINUX ON S/390 vs. Microsoft on INTEL

Estimated Five year Cost: Using Linux on S/390 for Web-hosting and General Purpose (Misc. application/data base servers)

75 Images 150 Images 300 Images 7000 Image

Linux on S/390 $6,472,110 $8,224,760 $12,596,972 $38,670,424

Intel Servers $7,959,235 $14,596,870 $26,821,820 $218,462,900

Projected Savings using Linux on S/390 $1,487,125 $6,372,110 $14,224,848 $179,792,476

* Linux on S/390 for 80 and 160 Images assumes deployment on additional engines on existing S/390 processors.

As Dave Barry would put it: "I'm not making this up" (or, in British English: "I sham you not"). We sent this document to IBM with a request that they either deny or confirm it's one of theirs and they haven't been heard from since.
That $38.6 million five year operating cost shown is, incidently, for a four way G6 machine running at all of 660MHZ.

One of the odd things that you can see happening if you watch the newsgroups on this is a steady increase in the extent to which claims are exaggerated. If 41,400 concurrent Linux instances on a mainframe doesn't impress you, perhaps 99,999 will? No? how about 99,999 in each of 15 LPARS on one machine?

That escalation happens outside the discussion groups too. Think about what it would take to make you honestly believe that you could effectively replace 7,000 Intel based servers running Linux with a single, four CPU, S/390. Think further about what it takes to write up an eighteen page presentation on this with all the cost numbers carefully worked out and then go and seriously present it to real clients facing real computing problems.

The sheer emotionalism of the personal response, the exaggeration of unsupported performance claims, and the escalation of commitment in the face of contrary evidence requires an explanation that goes beyond the enthusiasm of people who've discovered an exciting new technology.

There are, I think, two inter-linked behavioral phenonmena being demonstrated here. The list members know what's going on, most of them have daily access to Linux on the mainframe and can see its costs and limitations far more clearly than outsiders can.

In this context their behavior is strikingly reminiscent of what happens in religious communities focused on a single prophecy when that prophecy fails. Consider this summary of the underlying behavioral pattern from Cutting down the dissonance: the psychology of gullibility by Columbia University's Christina Valhouli:

In some cases, contradictory evidence can even strengthen the belief. As Leon Festinger and colleagues discussed in When Prophecy Fails, holding two contradictory beliefs leads to cognitive dissonance, a state few minds find tolerable. A believer may then selectively reinterpret data, reinforcing one of the beliefs regardless of the strength of the contradictory case. Festinger infiltrated a doomsday cult whose members were convinced the earth was going to blow up; when the date passed and the earth didn't explode, the cult attributed the planet's survival to the power of their prayers. "When people can't reconcile scientific data with their own beliefs, they minimize one of them--science--and escape into mysticism, which is more reliable to them," says Dr. Jeffrey Schaler, adjunct professor of psychology at American University.

Context Switching Overheads

A question raised by slashdot readers involved my comment that SPARC chips are capable of doing context switchs in one machine cycle. MC68XXX chips later than the 68010 had up to 8 CPU registers dedicated to tracking the page addresses for in memory processors thus enabling fast switching for a few key processes. This idea was incorporated in the first SPARC designs and became standardized with V9. The SuperSparc 40Mhz could do a full switch in 17ns for up to 4096 processes - which is actually marginally better than one system cycle (25ns) - if the referenced process used less than 256 (?) pages of memory. After V9 this limit was effectively removed. For a discussion of this see a review of V9 by Dave Ditzel, then director of Advanced Systems at Sun labs in which he describes a solution which yeilds essentially zero overhead context switching for most lightweight processes (aka threads).
This seems to fit, expecially if you compare the reactions of the Marist college Linux-390 group to that provided by Slashdot readers. The slashdot readers wander off topic in tribute to the Alpha, and in search of some alleged weakness in the Linux TCP/IP code, but they don't become absurdist, defensive, or abusive either in their postings or in the personal email I've received from them. Maybe it's just me, but they seem a far more balanced and self confident lot.

Context Switching Overheads
A question raised by slashdot readers involved my comment that SPARC chips are capable of doing context switchs in one machine cycle. MC68XXX chips later than the 68010 had up to 8 CPU registers dedicated to tracking the page addresses for in memory processors thus enabling fast switching for a few key processes. This idea was incorporated in the first SPARC designs and became standardized with V9. The SuperSparc 40Mhz could do a full switch in 17ns for up to 4096 processes - which is actually marginally better than one system cycle (25ns) - if the referenced process used less than 256 (?) pages of memory. After V9 this limit was effectively removed. For a discussion of this see a review of V9 by Dave Ditzel, then director of Advanced Systems at Sun labs in which he describes a solution which yeilds essentially zero overhead context switching for most lightweight processes (aka threads).

On the other hand a case can be made for viewing the response in the context set by forty years of effort in developing and enforcing mainframe management methods. The fundamental conflict there lies between the Unix view that the resource is cheap and plentiful and should therefore be pushed out to users versus the mainframe view that the resource is limited and expensive and should therefore be tightly guarded.

It's that conflict which led some list members to respond to my request with honest bafflement. The idea that anyone would run SETI@home is so counter to normal practice that it connot be understood. One person suggested it would take infinite time because the system would assign it so a low a priority that it would simply never be paged in.

To see just how foreign that mindset is from a Unix perspective, think about this: if you send a list management command to LISTSERV@VM.MARIST.EDU you get two, not one, responses:

one pertaining to whatever command you sent it; and,
one listing the resources used; viz:
Date: Sat, 18 May 2002 14:37:00 -0400 From: "L-Soft list server at MARIST (1.8d)" Subject: Re: GET LINUX-390 LOG0205 To: murph@winface.com > GET LINUX-390 LOG0205 Summary of resource utilization ------------------------------- CPU time: 0.444 sec Device I/O: 27 Overhead CPU: 0.300 sec Paging I/O: 904 CPU model: 9672 DASD model: 3390 Job origin: murph@winface.com

Would a Unix manager care? Of course not, but the fact that this is considered good professional practice in the mainframe community perfectly illustrates their headset.