Draft Blog Entries

% fortune -ae paul murphy

Reliability: the biggest challenge of all

I remember getting my first Sun 490 with a 64 disk attached array - it was huge, physically much smaller than the 2GB DEC array it replaced, but filled with enough of those tiny (5.25") 204MB disks to give us the unimaginable freedom of 12.6GB of accessible space. We thought we'd never fill it, and only mentioned that 12.6GB number in hushed voices.

The biggest problem with that gear was reliability: no ZFS, disks squealed continuously and died regularly, the RAID card overlay was no more reliable than Sun's own disk manager - I never knew what I needed more: replacement disks, patience, or the phone number for a good exorcist.

Sounds horrible, right? actually it was magic at the time - and in science processing we're heading right back there today.

Consider, for example, the IEED example. With IEED we're contemplating putting a bunch of cell processors on a satellite and expecting them to run error free for years spent in low earth orbit. That's not impossible, or even improbable -and both IBM and Sony have the numbers to prove it.

What neither one has, however, is anything very convincing on Linux and firmware reliability in interaction with our stuff - in large part, of course, because we haven't invented our stuff yet.

Nevertheless, the size of the data flows involved means that our reliability problem is going to mimic what happened when the first Unix arrays became affordable - minus, one hopes, the continuous squeal.

Thus a single 36 bit, 8000 x 8000 spectrogram is going to spill over 256MB before on board compression - and a 66% reduction is going to have the transmission filling eight 1TB disks every day - meaning that we'll need two groups of about 280 disks on-line if we want to maintain both system level redundancy and two days of slack in the secondary storage procedure against a 4% per year expected disk failure rate.

And if those numbers just don't sound big to you, bear in mind that a guy pushing a (rather large) grocery cart full of tapes will move a month's worth data from Stanford to Berkeley in half the time time it would take using a dedicated 10MBS link.

In that same vein, suppose default on-board processing takes 12 seconds per image on a dual cell blade (i.e. about four x86 minutes at 3.2Ghz) then a one per second acquisition rate means we'll need to have 12 of them running all the time just for that job. Focus processing (when people on the ground think something's worth a more careful look) might take a minute per image set - and happen in bursts during which every second counts. Since we don't know what the failure rate will be, prudence suggests we'll need to err on the side of risk reduction - choosing to orbit perhaps twenty dual cell blades, thus providing eight for use during failure conditions or for assignment to ground workstations during focus events.

IEED is a bit esoteric, but a lot of other big data users are pretty mundane: 10TB for the average HD movie in its final form, a few 10s of megs for each of a few million users on community sites, more for search and service sites - while PPC based controllers in military applications like remote reconnaissance drones already handle upwards of 10MByte/Second in real time, 24 x 7; all the while accumulating terabytes of analytical data that's mostly never looked at simply because volumes are too large.

Thus the bad news for science processing is that it's back to the eighties - but this time with three to four new zeros tacked on the end of every measure. And the good news? We not only survived it then, but retrospectively found it fun -plus, in those days it took years for the experience we gained doing it to became commercially valuable, now that's happening almost in real time.

Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.