% fortune -ae paul murphy

Intel's 80 core CPU

Here's a bit from the EETimes report:

Intel's researchers have produced an 80-core chip that uses less energy than a quad-core processor and has teraflop performance capabilities.

Researchers have built the prototype to study how best to make that many cores communicate with each other. They're also studying new designs for cores and new architectural techniques, according to Manny Vara, a technology strategist with Intel's R&D labs. The chip is just for research purposes and lacks some necessary functionality at this point, but Vara says Intel will be able to produce a chip with 80 cores in five to eight years.

Sounds pretty cool, but I think it actually demonstrates just how far out of touch Intel has let itself get.

On the obvious side, every advanced GPU has more than 80 cores - and their software mostly works now. More importantly, however Intel's product is still five years out, there's no software, Intel's compiler research has tanked, and IBM's cell patent blocks their path because it's precisely about how you make a general purpose grid work in silicon.

Cell is, furthermore, available today -and you can get nine of them plus a steak dinner for the price of a single 2.4Ghz Intel Core 2 Quad.

Most importantly, however, Intel is promising a teraflop on a chip in 2012 - but the dual cell Mercury Computing blade IBM sells as the QS20 offers 400 GigaFlop sustainable, single precision, operations today - and IBM has been working on the compilers. Here's a bit from a paper accepted for the the IBM Journal almost a year and a half ago:

Our Cell BE compiler implements SPE-specific optimizations, including support for compiler-assisted memory realignment, branch prediction, and instruction fetching. It addresses fine-grained SIMD parallelization as well as more general OpenMP task-level parallelization, presenting the user with a single shared-memory image through compiler-mediated partitioning of code and data and the automatic orchestration of the data movement implied by this partitioning. Using benchmarks suitable to this platform, we demonstrate average speedup [relative to standard compilers] factors of 1.3 for SPE-specific optimizations, 9.9 for SIMDization, and 6.8 for task-level parallelization.

That same EETimes article quotes the redoubtable Rob Enderle describing Intel's effort as "revolutionary" - but compare it to PPC/Cell and I think a different word applies: pathetic.

Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.