Front page teaser para:

Sun's DARPA contract on the development of petaflop computing has led to a radical new CPU design. Known internally as the "Corona" this has the potential to revolutionize computing but development costs could bankrupt the company if Itanium style failures or delays are encountered.

Page lead extract: " It seems likely, nevertheless, that Sun chairman and CEO Scott McNealy, will ask Sun's board to bet the company by approving what amounts to a blank check to partners Fujitsu and Texas Instruments for an attempt to build laboratory prototypes for testing and evaluation.

McNealy to bet the company on Corona CPU

- by Paul Murphy, -

The standard chip making process, based on using lithography to etch out circuits in silicon and other materials, has been in use since the mid seventies with change expressed mainly in increased manufacturing precision as decreases in the wavelengths used allowed the development of ever smaller components. Pundits, of course, have been predicting the end of this shrinkage process for a number of years, but so far no limit has been reached on the ability to realize Moore's law by periodically doubling the density of components etched on the chip. Indeed most of today's advanced CPUs are made at 90nm with a ramp up to 65nm production processes underway and laboratories now demonstrating successful 10nm processes.

As early as 1987, however, work at IBM's Thomas Watson Laboratories in New York demonstrated the feasibility of the opposite approach: using ion deposition to build the chip up from a substrate instead of cutting it into a surface laid down on that substrate. In theory, that technology can be used to replace the usual flat mat design with a three dimensional spherical design in which all flow distances are minimized. Such a sphere would be honeycombed with cooling tunnels and a use single, very high bandwidth, network style connector threaded through the sphere in lieu of the traditional edge connectors.

Deposition methods now exist that can build a chip quite literally one atom at a time with a theoretical density increase in the range of six orders of magnitude relative to X-ray lithography - about 1000 times more than the increase in component density from the 1979 8086 to today's P4E2. Despite this potential, however, two factors, one technical and one commercial, have kept theory from becoming reality. The commercial factor is fairly simple: it is expected to take at least ten years and a very large number of dollars to build a production scale plant capable of volume output. Since that planning horizon exceeds current product generation life, costs are largely unknown, and proven alternatives exist, no one's been willing to pioneer this technology for the realization of existing CPU designs despite the distance cutting advantages of the spherical format.

The technical issue is far more complex and interlinked. The most difficult component has been that the Riemann equations describing interactions along the edges of the sub-nanoscale devices to be "grown" in this technology don't allow point solutions - meaning that the information flows seem unpredictable mainly because, at that scale, quantum instability affects everything. As a result earlier designs have been limited to much larger, nanometer scale, devices to which quantum considerations don't apply but which therefore also don't offer enough of a performance gain to justify the additional manufacturing complexity.

Four years ago, however, a Russian mathematician, Igor Dimitrovich Turicheskiy, working at the MV Keldysh Institute of Applied Mathematics in Moscow provided a breakthrough solution when he showed that the apparent unpredictability of flow directions at these edges could be resolved through relativity theory. Although I don't begin to understand the math, his work apparently explains the observation that information flows across a quantum scale device boundary generally don't exit the boundary in the same order in which they entered it -the so called chaotic flow limit to quantum computing that stopped IBM's Josephsen Junction effort - is actually a predictable consequence of relativistic distortions of their apparent crossing time. Thus information about the order with which information flows are produced within such quantum assemblies coupled with knowledge of the electrical properties of the medium, allows the complete prediction of its arrival pattern at another component.

This, of course, strikes at the heart of current CPU design limitations in which the time and voltage needed to drive electrons along internal connectors limits the physical size of the core and thereby give rise to attempts, like Sun's throughput computing initiative, to by pass some of those limits through the use of multiple parallel cores.

Unfortunately SMP style "throughput computing" has its own downside - memory requirements increase as a function of throughput. For example, Sun's present top end, the 25EK, comes with up to 72 dual core UltraSPARC-IV (US-IV) processors and needs up to half a terabyte of RAM to function efficiently. For the "Jupiter" series planned around the future US-VI processor, that maximum will rise to 72 CPUs each with up to 32 integrated cores - giving the machine the estimated throughput equivalent of a three terahertz US3 machine but requiring something like 16TB of RAM. With current memory technology such a machine would need over a mile of memory sockets - a clear impracticality even in an ultradense packaging environment such as those IBM plans for its "bluegene" series machines.

Turicheskiy's mathematics offers a nearly miraculous "double whammy" solution to this. Not only would atomic scale system assembly enable Sun to place a full terabyte of memory directly within each core but the time dilation effect experienced by data moving across those boundaries at very nearly light speed offers gigahertz multiplication as an apparently "free" side effect. This balances the mathematics of quantum interchange on the edge of paradox with a clock that registers 1GHz internally appearing to run at about 12.5GHz when viewed from outside to provide a full order of magnitude in apparently "free" throughput improvement.

Of course, in reality physics does not allow for "free energy" and such a machine would come to stop about 12 seconds after start-up if the designers didn't provide an additional energy source. In this case the free electrons needed will come from the use of a liquid near superconducter as both coolant and network bus. Circulated through the spherical CPU using pressure generated by a nanoscale sterling engine powered by waste heat, this material remains electrically continuous and offers nearly infinite bandwidth as it flows through both the CPU and its external connectors to disk and network resources.

At present no one really knows what it will cost, or how long it will take, to develop the machines that will build the machines that will make these kinds of systems. Certainly DARPA's initial $50 million contribution barely accounted for the cost of the design software needed for three dimensional component layout. The buzz among Sun board members is, nevertheless, that chairman and CEO Scott McNealy will ask the board to bet the company on this technology by approving what amounts to a blank check to partners Fujitsu and Texas Instruments for an attempt to build a laboratory prototype for a future Sun Chronosphere CPU series.

If, as seems unlikely given the publication date, this rumor proves to be true the effort will mark an enormous gamble for Sun and its always mercurial chairman but could give the company control of the entire computing universe for decades to come.

Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 20-year veteran of the IT consulting industry.