% fortune -ae paul murphy

Hey, I love you too, guy

Here's a comment from not so frequent contributor "SO.CAL Guy":

murph_z you should be the last person calling someone a zealot. your the biggest zealot on zdnet. your blog is near the bottom of the list for a reason.

you don't write about anything that someone would want to read about. every post i've ever read of yours which i have to admit is not many. you are ether calling everyone who does not hold your beliefs about software stupid. or calling anyone who uses windows a moron and post a link to try and back up your uninformed ideas about what stupid is. i could go on and on but i won't.

heres some advice write about tech the good or bad. not about how everyone but you is a dumbass and should be overjoyed that you would give us a few words from your all knowing intellect.

just food for thought.

Umm, the zdnet blog technical (i.e. non SCO related) comment I've been most reviled for came from my February 2006 prediction that just multiplying out the per thread megahertz for both Xeon and Sun's then pending T1 UltraSparc would provide a reasonable guide to relative performance. Thus my conclusion at the time was that "it would take somewhat more than eight 3.2Ghz Intel Xeons to match one UltraSPARC T1 at 1.2Ghz".

As I noted nearly a year later, this idea didn't exactly get rave reviews:

Once upon a time, and in a lab far far away, there was a little machine rejoicing in the name of "atchewi". Atchewi lived for throughput, and as a result I started predicting that a single 1.4Ghz UltraSPARC T1 would offer rough performance equivalence, for non floating point intensive tasks, to a hypothetical 44.8Ghz Xeon -and people pretty much unanimously thought I was nuts.


When Sun started selling the machine they offered it with four, six, or eight working cores running at either 1 or 1.2 Ghz, so I was wrong about the clock rate - but multiple benchmarks carried out by customers around the world have shown that the T1's performance does indeed roughly match Xeon on a cycles per thread basis - i.e. that a 1Ghz, four core, T1 offers roughly a 16Ghz Xeon equivelance (provided there's no floating point component) while an eight core, 1.2Ghz "Coolthreads" system runs some jobs at very nearly the rate you'd expect from a hypothetical 38.4Ghz Xeon.

Not that many people bought into this then either; but, as regular readers know, I'm not that easily persuaded of the error of my ways, and went on to dig the hole deeper:

Today [Dec 11/2006] what I want to do is venture another absurd prediction: that the floating point performance for the forthcoming second Niagara generation will be as much a surprise to the general IT community as the T1's character pushing performance has been. Specifically I think it will perform about like a T1 on workloads with very many small jobs, do about a third better on workloads requiring more extensive processing, and astonish everyone by showing no significant drop in throughput as active threads become increasingly floating point intensive.

The reason for that goes far beyond the addition of seven floating point cores: with Niagara2 Sun puts more of the machine on the chip -memory controllers, dual 10Gb/s networking, hardware cryptology. Combine the hardware with Solaris/ZFS and what you get is a recipe for world-beating RDBMS performance.

At the time, this prediction was considered so over the top that even Sun's president Jonathan Schwartz evinced some wry cynicism about it in his blog

The first reaction most folks have to the performance is, frankly, disbelief. A while back I got into a spat with the technologists that built the machine about whether we could fairly call them 9.6Ghz machines (as a measure of clock frequency of the chip). Paul Murphy has an interesting analysis of whether that's a fair descriptor (I say interesting because he says we're underhyping the performance - a first for the industry!).

So now, of course, here it is January 2008, the machines are widely available, and it's possible to check the prediction against reality.

Luckily for me, it's been a bit of a no brainer -as these benchmark results show relatively low end Niagara machines rather easily beat eight Xeon cores pretty much across the board - it's even 14% faster on SPEC's IntRate_2006 throughput than Intel's latest and greatest Quad-Core "Xenia" X5460 (3.16GHz 1333MHz 12MB 120W) in an HP DL360 G5.

BAsically the only benchmarks it doesn't own are the ones Sun hasn't had a chance to post a result for yet.

Some benchmark results, however, are more interesting than others. Consider, for example, a workload bracketing pair starting with an LDAP benchmark reported on the Sun directory manager blog.

From the introduction:

Sun T2000 vs Dell 6850 Revisited

Last month, I wrote about a demo that we presented in Austin comparing LDAP authentication performance on the Sun Fire T2000 server (one UltraSPARC® T1 processor at 1.0 GHz and 32GB DDR2 memory) with that of the Dell PowerEdge 6850 server (four dual-core Intel® Xeon® EM64T processors at 3.2GHz and 32GB DDR2 memory), which is about the best that Dell has to offer. You can read that post for the details, but in short the Sun server (which is cheaper, smaller, and consumes a lot less power than the Dell system) won the race pretty handily.

And four sample entries from his extended results summary:

System Type Operating System Number of User Entries Average LDAP Authentications per Second
Dell PowerEdge 6850 Windows 250,000 2,984
Dell PowerEdge 6850 Solaris 250,000 4,375
Dell PowerEdge 6850 Solaris 10,000,000 2,800
SunFire T2000 Solaris 10,000,000 4,457

Notice first that the Xeon did more than a third better under Solaris than under Windows, and second that the 1Ghz T2000 (1ghz x 32 threads =32 (Xeon) Ghz?) did about twice the work the Dell did with eight Xeon cores at 3.2Ghz (=25.6 (Xeon) Ghz).

The second half of the pair is a recent result for the second generation T2 reported by Phil Harmon under the title Niagara 2 memory throughput according to libMicro.

He provides a technical discussion plus pointers to the benchmark source, but the bottom line comes in a comparison showing a full T2 (8 cores, 1.4Ghz) maintaining 267 million memory reads per second - 3.1 times the 86 million achieved by a four socket (16 core?) Intel "Tigerton" at 2.93Ghz.

These two results nicely exceed my early prediction - and bracket both Niagara generations and workload characteristics.

These results seem to hold across the board - look at a broad variety of business and academic benchmarks and you see both Niagara generations blowing everything else away on metrics like SWaP while beating from four to sixteen Xeon cores, depending on configuration and workload, on typical business tasks.

So what's my bottom line? Well So.Cal Guy, zealots are by definition always wrong - and the one claim I've received the most criticism on, pretty much worked out - so, sure, I love you too: but next time? maybe you should do some homework before spouting off - ok?

Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.