Almost two years ago I predicted that Sun's then pending T1 and subsequent T2 CMT machines would offer throughput levels roughly comparable to x86 machines producing the number of cycles you'd get multiplying Sun's CMT megahertz by their thread count -i.e that a T1 would produce roughly the throughput of eight to ten Xeon cores on non floating point intensive work. Although that turned out to be somewhat pessimistic with respect to completely integer workloads, reality also showed that the T1's floating point limitations had somewhat more impact than expected - i.e. that floating point is more important to more applications than I expected.
The T2 generation adds full floating point support to each core and therefore remedies this limitation on the T1 - making it rather easily, as shown in the various benchmarks recorded by Sun's "bmseer" people, the highest throughput chip in volume production.
However, there are two related performance issues which I don't think are getting the attention they deserve. The easy one relates threads to CPUs and addresses the misconception that a multi-threading 1/4Ghz CPU has to be less appropriate to data center use than a higher speed single thread CPU like an Opteron or Xeon. Here's a bit from that same bmseer blog on this issue:
Can I use 64 threads in a chip?
Can someone really use 64-threads in a chip? The answer is simple, when you look out into your datacenter do you see racks of servers or just a single naked core sitting alone in the back corner?
If you see racks of server you are running lots and lots of threads. Think of it his way, if you have a bunch of dual-core single-socket 1RU servers filling a rack you have around 80 threads in a rack, or 2-socket you have 160, or quad-core 2-socket you have 320 threads.
Now how would you judge performance of a single rack (with 80-320 threads)? Would you run one copy of "gzip" or "tar" and compare that to your laptop and say that rack is slow, of course not., You'd run a whole bunch of them.
So when you are performance testing an UltraSPARC T1 or UltraSPARC T2 server throw lots of work at it and it will have no problem. There is massive parallelism in every datacenter with racks of servers. Perfect for UltraSPARC T1/T2. Every datacenter with web-tiers, application-tiers, and database behind those tiers runs tons of threads. And remember the UltraSPARC T1 and introduction and even last week continues to set leading performance records at every tier.
Intelligence test Would you judge performance of an UltraSPARC T2 by running a single "gzip" or "tar"?
That last line brings up the second issue - one I've not seen anyone talking about - probably because the right answer is that "Yes, I would."
That may sound like a dumb answer: but it isn't because the T2 has integrated, hardware, cryptography and 10Gb ethernet -meaning that if you're storing encrypted data across the network this thing will easily outperform the fastest Xeons on a per thread basis.
Right now not that many people do a good job of using encrypted data storage - but here's a thought: a second generation "thumper" (X4500) could replace those two Opterons with a single T2 and deliver 48TB of fully encrypted NFS mountable, ZFS data at 10GB wire speeds.