Draft Blog Entries

% fortune -ae paul murphy

The T2 and media reaction

The UltraSPARC T2 announcements back on August 7th consisted of two separate stories. The first of these involved Sun's plan to sell it into the commodity processor market. Unfortunately almost nobody in the press understood either the announcement itself or where Sun's taking this, with the straight up treatment given it by CNET's Michael Kanellos about as good as it got. Some key bits from that report:

The company on Tuesday plans to announce its new UltraSparc T2 microprocessor, along with plans for servers based on the chip. Sun plans to insert the UltraSparc T2, an eight-core, 64-thread microprocessor, into servers that will hit the market in the second half of the year.
But it will also sell the chip to third-party manufacturers of storage equipment, networking devices, set-top boxes and other computers.
"We don't want to limit ourselves to the server market. The server market won't grow nearly as fast as the storage or networking market," Jonathan Schwartz, Sun's chief executive, said in an interview. "While we are making them, we might as well make them general-purpose enough to sell them to the broader market."
To accommodate these various markets, Sun will also introduce different versions of the T2; some will have fewer cores, and some might consume less energy.
The full-fledged UltraSparc T2 will cost less than $1,000 (Ł493), a steep price in any chip market, while the simpler designs will cost less. Sun is already speaking to potential customers, such as several networking and storage companies in different parts of the globe, Schwartz said.

Unfortunately the second half of the story starts by mixing reporting with the reporter's own unexplored and broadly incorrect certainties about the industry -in, for example, his attempt at establishing some historical context for his readers:

Sun tried to sell UltraSparc chips to other server makers in the early to mid 1990s, but the company retrenched. Since then, UltraSparc chips have been used by other manufacturers, Tadpole Computer has sold Sparc-based laptops, but Sun has not pushed this part of the business heavily.

In reality Sun has sold, and continues to sell, SPARC into the embedded processor market but the ninties strategy producing today's APL processors and the Fujitsu-Siemens ultraSPARC servers was based on trying to grow a clone market via licensing, not through open sourcing. As a result the equivelance implied in the comparison denigrates Sun's current strategic commitment to open sourcing both Solaris and the CMT hardware.

Beyond this, he goes on to illustrate three of the four most frequent errors I found in tech press coverage of Sun's announcements:

assuming that an intuitive understanding of x86 multi-core parallelism applies to the T2
Unlike the x86 designs, the T1 does not get its performance advantage mainly from core parallelism: it gets its advantages mainly from the machine's ability to treat all off-register resources equally while switching process states between them at near zero cost.
Right now, however, the point is that a T2 with a single working core set (logic, floating point, cryptology, and packet management) at 1.4Ghz would not be performance competitive with a 3.2Ghz Xeon on single threads requiring little memory, but show a shorter average response time on high throughput network workloads like diversified web services - and make the Xeon look like a 386/25 on specialty jobs traditionally bottlenecking at both cryptology and packet management.
Parallelism does, of course, apply: with eight working cores a T2 can run 7 threads in absolute lockstep - but it's the near zero cost context switch and memory management technology that elevates this to 56 seemingly concurrent application threads and makes the T2 look like a real 64 processor machine to Solaris, Linux, and users.
Note, however, that there is another, and potentially more important, level of parallelism within the co-processors. Cryptology support, for example, actually comes from two compute units that can be run in parallel with each other and whatever processing happens in the calling thread - but while both the latest Solaris 10 kernels (including ZFS) and the Sun Studio 11 tools offload PKCS and related processing, the big performance benefit possible from this has to wait for application developers willing to use it -meaning that the opportunity could go the way of SPARC's SIMD instruction set: enormously powerful, but widely ignored.
failing to differentiate threads from processes and processors
This is the single core variant of the error discussed above. People who silently assume that the natural order of things features one x86 core running one application tend to interpret Sun's 64 thread announcement in that context and assume, therefore, that the T2 will run 64 OS ghosts each with one application. Here's Kanellos again:

Sun executives and engineers will show off benchmarks and other data on the new chip at an event in Austin, Texas, this week. The UltraSparc T2 will have eight cores, with each core capable of managing eight threads. Because each thread on each core can handle an operating system, a single chip can therefore run 64 operating systems simultaneously.

In reality there is no relationship between the ability to run concurrent threads and the ability to run ghosted environments - you can walk 64 Solaris containers on a ten year old Sun Ultra2 workstation if you want to - while the T2's hypervisor limit of 64 on LDOMS responds to market expectations created by IBM's partial CPU licensing rather than any necessary relationship to the T2's 64 switched threads.
The bottom line on this is simple: the T2 is broadly speaking a 64 way SMP machine built on one chip, not several x86 cores with shared memory access (like AMD) or several uni-processors in the same package (like Intel).
cost
Sun announced that the high end part: eight working core sets at 1.4Ghz, would be offered on the commodity market at less than $1,000. Almost universally, the tech press took that as $1,000 and declared it too high relative to Intel's low end pricing. In reality, however, almost everything about this comparison is bogus.
First the expected price is not $1,000 per unit. Here's the applicable bit from Sun's press release:

The UltraSPARC T2 processor is available in production quantities this quarter, with prices starting well below $1,000, and licensing options wide open for derivative works.

Second, the comparison should be to Intel's higher end processors - not its lower end machines. According to Intel's July 29, 2007 price list: the Itanium® 2 model 9050 processor costs $3,692 in box sets of 1000 processors at 1.6Ghz -meanwhile the Core 2 "extreme" at 2.93Ghz is shown at $999 and the X5355, 2.66 Ghz Xeon, is expected to drop from $1,172 in July to $744 in August.
Of the first ten different stories returned by google news as mentioning T2 pricing, only Clay Ryder's story on theregister.co.uk got this about right with none of the others veering from the Wintel party line - even Kanellos rewrites the bit about pricing "starting well below $1,000" to insert an editorial message of his own: saying that the "T2 will cost less than $1,000 (Ł493), a steep price in any chip market."
1.4Ghz times eight cores = 11.2Ghz.
This is true, but wrong. It's true in the arithmetic sense and true in that you can't get more than 11.2Ghz worth of completed instructions per second out of an eight core, 1.4Ghz, CPU. It's also completely wrong because the hidden assumption is that we know what a Ghz amounts to -and the implicit standard of comparison for this is the x86 Ghz.
Unfortunately most people haven't a clue how little gets done during the typical billion x86 cycles - because most people simply equate instructions to cycles, don't know about context switch costs, and haven't any idea how much time gets spent doing no-ops or house keeping. In fact, however, the typical x86 processor running at 100% utilization is spending less than 15% of its time doing useful work -with more than 85% committed to waiting for memory, managing it's own infrastructure, or handling pre-compute tasks like network packet identification and ordering.
Since the T2 architecture eliminates most of the reasons the Xeon has to waste most of its time on non productive tasks, equating one T2 processor set cycle (including floating point, cryptology, and network management co-processors) to one Xeon cycle is inappropriate. What the ratio should be depends largely on the workload - for discontinuous, high throughput, tasks like web serving or Java based application virtualization each of the hardware's 64 time slots should probably be counted as a Xeon equivelent to give a rough equivelency of 89.1 x86 Ghz.
There's an odd corolary to this: you'd think that some apparently continuous tasks like linpack work would represent the opposite case to the web workload, but experience with the T1 suggests that this is wrong. Instead most real world linpack processing seems to stall on memory, meaning that the T2 may prove far more effective for this type of work than most people expect.

So what's the bottom line? Simple: most of the tech press reports about the T2 announcement strayed from the facts announced in the press releases, and when they did, they got most of it wrong.

Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.