% fortune -ae paul murphy

Software Quality at Web speed

Two topics people never seem to tire of talking about are software quality and the web. Put them together and what you have is a new way of using one to gain insight about the other.

After joining Sun in 1985 Wayne Rosing became largely responsible for key workstation engineering, the drive to full SMP, and the adoption of the scalable processor architecture - SPARC.

After leaving Sun in the early nifties he took what amounted to a sabbatical to work in astronomy and then became the VP of engineering for three years at Caere, a manufacturer of PC oriented OCR software

From there he moved to Google to help direct infrastructure development.

Here's a fascinating bit from an ACMQueue interview he did in September, 2003, with David J. Brown, also a Sun/Stanford emeritus. It's a long interview, well worth reading carefully for its comments on teamwork and unconventional engineering management, but in this section he's talking about the cost and business implications of faults in software:

WR: So Sun has to come out with a product, as represented with a software interface, for which, in some sense, the minimal acceptable standard is perfection. Now, we all know that you never really, truly achieve that. Because you can't. Sun cannot possibly test every conceivable use that its software is put to, it's a pretty tough software engineering problem.

Caere represented a different dimension. Its product was an end-user product that parsed bits from a scanner and turned it into text. That's a very imprecise science, at best, but that software had to work reliably. And because we were a small company and we were using third-party distribution, we had economic constraints that basically said that the CD had to be perfect, in the sense that it would never experience a recall, gain, an impossible task, but one that we got very close to achieving.

Google is very different. First of all, if we make a mistake, as soon as we see it on the site we can have the engineers go figure out the fix. We can push the software in a matter of hours, and we can update it. If we make a mistake on our own site, short of bringing the thing down, which, of course, we'll know instantly, we can fix things, because we don't have this problem of software recall or the associated revenue problems.

On the surface he's talking about the trade-off between cost and product quality for three very different product environments. Read the entire interview and you'll see he repeatedly reinforces the message linking quality concerns to cost consequences and then tying those back to the customer's perspective.

That's good stuff, but there's more to it because I also see something a little deeper here: I see a man thinking about the quality implications of the distribution channel and coming to the realization that shortening channel response time, for any product, increases the need for quality -an explanation for open source quality and quite the opposite of MBA textbook opinion.


Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specialising in Unix and Unix-related management issues.