Draft Blog Entries

% fortune -ae paul murphy

iSense? Got'em? SASPO? IEED?

A research engineering company like the one we're imagining would typically be founded by a senior researcher whose name earns the funding request its DARPA audience and one or two graduate students who see the underlying research as a potentially profitable way of combining their PhD work with service to country. The underlying concept would be simple, but the implications breathtaking: in this this example, that explosions like those taking place in a rifle chamber emit both EM radiation and particles - creating spectral and kinetic traces susceptible to remote sensing and analysis.

The scientific basis for the research on infrared explosive emissions detection is that any chemical explosion produces a discontinuous series of emissions spectra followed by a dense decomposition plasma, and the expectation that both can be spotted, tracked, and analysed at a significant distance. Funding, however, is based first on a more practical reality: that the bad guys have to practice too, and secondly on the rather sanguine expectation that targets can be identified early enough in the attacker's training phase to render them harmless.

As a science demonstration our hypothetical graduate students would have developed an automated analytical tool capable of accurately differentiating within one half second among ten different loads for an ordinary 30.06 hunting rifle at a distance of 50 feet orthogonal to the firing line.

According to the humour slide with which we can imagine them ending their first DARPA presentation, getting from that demonstration to having a satellite based system spot terrorists training in Vermont - or differentiate Attackistan civilians attending a wedding from thugs practising for an atrocity- is "just engineering."

Luckily our consequent question is simpler: what are the dominant IT problems for this type of enterprise and how do we meet them?

Notice that the science problems aren't ours to deal with - our job is to provide the tools and our problems are all going to come from the classic twins: too much data combined with not enough time.

Space borne instrumentation produces enormous volumes of information - so much so, in fact, that finding the microscopic needles we need in the macroscopic haystacks the instruments send us is going to require on satellite processing, almost instantaneous instrument control, and full transmission for ground based back-checking.

A back-check system, incidently, is one which implements the same ideas as the primary system, but does it using code written by different people, using a different programming language, OS, and hardware environment. Its purpose in both research and military uses is to ensure that operational errors are quickly detected and correctly attributed to the code or the core ideas as the case may be.

For example, the original instrument package detected perhaps one photon per million emitted with each shot - but that was at a distance of fifty feet. In production, the system will run at around 110 miles and need a detection threshold roughly twelve orders of magnitude better. Think of it as filtering an Olympic swimming pool to diagnose drug abuse from a drop of blood - once every three to five seconds. In other words it's safe to predict that the science side will produce a process based on continuous wide area surveillance triggering closer examinations of data from areas where detectable events occur - meaning that the requirement will predictably be for very fast on satellite processing coupled with local instrument control.

We do not know in advance, of course, what the actual processing requirements will be, but starting off with these assumptions about structure makes sense largely because it reflects what we know or can guess about the eventual applications - meaning that a key architectural criterion is that software change should have little real cost.

Since that fits the two part architecture discussed here, it's time to place our bets in terms of implementing this using today's known or near term technologies.

Since space and power will be the primary constraints on the orbital package you can't really imagine an x86 grid up there, but Intel is working on thousand core systems - and every major graphics board maker has array processors of 64 or more cores working. It would be possible, therefore, to bet on those becoming practical for on board use.

In contrast IBM's cell is here now now enhanced precision on the way, offers at least comparable performance, and this, combined with the fact that is has considerable software momentum going for it, makes it the obvious low risk candidate.

Cell means Linux - because IBM has standardised the firmware implementing the cell architecture and the compilers being built around it on Linux.

Notice that choosing Linux on cell as your primary target for the space borne side of this is a good way of hedging your hardware bets because you can give the researchers something to work with today, knowing that they'll stay away from assembler level optimisation until late in the process, and therefore that your cost of switching hardware if a better choice comes up later will be relatively low.

This isn't true at the back-check and data storage end for ground based processing. The drivers there are that the back check has to use different hardware and OS technologies, and that data volumes will be huge.

To me, as frequent contributor bportlock predicted last week, that means the UltraSPARC T2, Solaris, and ZFS - simply because nothing else gets close to handling data volumes expected to reach easily into the terabytes per day range.

There will be a third team working as well: people trying to learn what the minimum information needed to distinguish over enthusiastic deer hunters from terrorists is, but they'll mostly be using stuff like Maple under MacOS X or Linux with only minor data requirements -none of which will affect our hardware/software decisions until much later in the process.

The day one infrastructure, therefore, is going to be Linux on cell for on board processing and Solaris on SPARC for ground processing.

What's most interesting about this is that everyone thinks of our industry as both Wintel dominated and diverse - but if you extrapolate today's processing environment into the future by imposing requirements combining high data volumes with very tight space, power, and time constraints you find that wintel players don't have anything to offer and therefore that this whole industry diversity thing comes down to exactly two surprisingly complementary technologies.

Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.