% fortune -ae paul murphy

The Megahertz Myth and the UltraSPARC T1

Until quite recently Intel's basic sales pitch on its PC CPUs focused on the notion that higher megahertz rates automatically produce improved throughput and most of the PC market accepted that as reasonable. It wasn't, of course; but Apple has lost significant market share over this bit of misrepresentation and Macintosh users worldwide have always been a bit defensive over the lower cycle rates achieved by PowerPC products like the G4/G5 series.

Today, of course, we have twin reversals: Apple falling to Intel, and Intel trying to pretend that its new found commitment to throughput and lower power computing with the Pentium M and its "Yonah" successors doesn't amount to a repudiation of its own megahertz marketing mythology.

Next week I hope to have real numbers on the specific issue of G4/G5 performance versus that of Intel's new "Duo" line -and it's not looking good for Intel. Meanwhile, however, there's something interesting to be learned by looking at the extremes: comparing Intel's highest revving CPUs against Sun's first go round on their ultimate throughput machine, the CMT UltraSPARC T1.

Here's how spec.org describes the SPECWeb2005 benchmark:

SPECweb2005 is the next-generation SPEC benchmark for evaluating the performance of World Wide Web Servers. As the successor to SPECweb99 and SPECweb99_SSL, SPECweb2005 continues the SPEC tradition of giving Web users the most objective and representative benchmark for measuring a system's ability to act as a web server. In response to rapidly advancing Web technology, the SPECweb2005 benchmark includes many sophisticated and state-of-the-art enhancements to meet the modern demands of Web users of today and tomorrow:

As of February 5th, 2006 there are eleven posted results, two of them with the data blanked out. Of the remaining nine, six are from IBM and five of those feature Intel processors. The fastest Intel machine shown is a Dell PowerEdge 2850 with two dual core 2.8Ghz Xeons which scored 4,850 or about 2.3 cycles per point. The fastest single core machine, however, was an IBM eServer x346 with two Intel Xeons at 3.8Ghz which achieved a score of 4,348 -about 1.75 machine cycles per point.

Of the seven Intel results, three relate to the Pentium IV, Pentium M, and Pentium D, respectively, and are easily outperformed by the four Xeon results. In total the Xeons, all with SuSe Linux, the Zeus Web Server, Apache Tomcat, and hyperthreading turned on, produced a score of 17,612 using 33.6 billion machine cycles per second - 1.75 cycles per point.

There is one result for an IBM p550, again with the SuSe/Zeus/Tomcat software, running on two dual core Power5+ processors at 1.9Ghz. It reached a score of 7,881 -meaning that the Power5+ needs 1.03 cycles per point.

Sun has posted only one result: 14,001 (or 0.68 cycles per point) on a T2000 with a single eight core, 1.2Ghz, UltraSPARC T1.

So if we blandly assume that the Xeon results would scale linearly, this suggests that it would take somewhat more than eight 3.2Ghz Intel Xeons to match one UltraSPARC T1 at 1.2Ghz. Similarly, it would take roughly four IBM Power5+ dual core machines.

So why is this interesting? For two reasons: first because that 2:1 ratio for Xeon to PPC crops up a lot in other benchmark results, and secondly because this illustrates the utter dominance of the "slow" CMT approach over higher megahertz on multi-threaded tasks.

There are some interesting implications here. One of the most subtle, and most important, relates to the competitive advantage the Java virtual Machine offers Sun in appealing to developers - because, by using its own JVM and Java server software on their test machine Sun demonstrated that the JVM could be used as an easily accessible intermediate technology to let developers take advantage of CMT hardware without doing much additional coding.

There are also far more trivial consequences, including one I happen to be interested in right now. One of the things that's going on in this benchmark is that the Xeons spend a lot of their time just waiting for memory - and, in fact, the faster they go, the higher the percentage of time they spend doing no-ops. Think about this in terms of marketing claims about gigahertz and you can see that Sun's occasional description of the T1 as a 9.6Ghz machine (because 8 x 1.2 = 9.6) can understate reality by a factor of at least three - since it takes a minimum of 27 Xeon Ghz to match it - and the Xeon's are running without the JVM overhead.

In other words Sun could reasonably claim a kind of "cycle equivelance" on web services for its T2000 at about 27Ghz in comparison to Xeon (and nearly 44Ghz in comparison to the Pentium IV). Now "Cycle equivelance" is not a concept I'd like to defend in the abstract, but I think it gives a rough and ready indicator of just how revolutionary this whole concurrent multi-threading idea really is - and that 1.75:1 ratio provides a good estimate for the low end of the range on Intel vs. PPC cycle equivelance on work done.


Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specialising in Unix and Unix-related management issues.