Draft Blog Entries

% fortune -ae paul murphy

Some general considerations for small systems

Lets assume a scenario under which the older system you're thinking about upgrading is relatively small, support costs are high, and you can't obviously transfer its workload to some other, larger, machine with adequate idle capacity.

Specifically, lets assume you have a Sun 490 from a few years ago (4 x 1.8Ghz USIV, 16GB, 4 x 73GB) that's still under support and runs an engineering document and database application critical to everyone from R&D to the people handling customer warranty claims.

It works, but the hardware is getting old - and support costs seem outrageous relative to the nominal cost of PC style servers: your predecessor signed up for full 24 x 7 Gold level support at nearly $8,000 per year - about 10% of the nominal list price when he bought it- and lots of people claim you can get ten PC servers for that: one for every six weeks in support costs.

This is, in other words, Red Hat's dream scenario - the primary one their anti-Sun campaign targets, and the one in which you're supposed to believe that buying a free Linux from them will give you better performance for less money.

In this situation the key things to consider are:

your tolerance for system failure;
your tolerance for security (in the PC sense) risk;
constraints on future change opportunities;
I/O limitations and storage growth rates; and,
staffing related issues.

Note that application level SPARC compatibility is not directly an issue - any application can be either migrated or replaced if the incentives for doing it justify the risk and costs involved. It's easier, of course, to upgrade to binary compatible HW/OS combinations, but that's a cost/benefit issue, not an absolute.

The failure tolerance issue comes down to this: SPARC (and Power) systems are built to higher quality standards than x86 ones - and that's true whether you're comparing at the low end, mid range, or high end in each category. As a result the issue here is whether you care about the quality you're paying for with that 490.

It's a low end machine for SPARC but to match the quality in the x86 world you have to go to the higher end stuff: typically Compaq's Proliant line, and that costs more than a new SPARC machine would. To make hardware savings, therefore, you have to be willing to accept a higher risk of hardware failure - so this comes down to how much of that you can tolerate.

All management speak aside, this is ultimately a gut call: my own rule of thumb being that if your users can see a cost difference between eight hours a year in downtime and two, then sticking with the higher end gear will be the right thing to do even if that cost difference seems smaller than the hardware savings .

The security (in the PC sense) issue is this: you only care about the risk of attacks that work or could work - meaning attacks that exploit code or process vulnerabilities in ways that can be directed against you. Since every OS and application has code vulnerabilities, and every process involves people and/or networking, the determining factor is how high the exploit barrier is.

In the x86 world exploits are virtually synonymous with vulnerabilities, but because this isn't true for PPC or SPARC the barriers there are much higher - witness, for example, Apple's transition from a company that could build a security reputation while ignoring vulnerabilities on PPC to an x86 maker that's rapidly losing its reputation for security despite obsessive patching.

Again the question is one of comparing risks to possible costs and other consequences: basically, the worse the consequences a successful attack could be for you, the further you want to stay away from x86 - and if there's a genuinely compelling reason to use x86 in a high value situation, bite the bullet on porting your application to OpenBSD and have security experts go over your code line by line as part of that process.

The opportunity cost issue on software change is one of the hardest to get your head around. The question is at what point change now starts to significantly drive up the cost of future change. In the obvious version of this you make a change decision today, and tomorrow's vendor announcement means you've spent the money buying the wrong thing -but the more interesting, and more subtle, version is that you spend your change budget (including non dollar spending like stressing out user management's tolerance for change) and tomorrow one of your people comes up with a new idea that you really want to implement but can't -and one thing I'll guarantee you is that nobody on your staff will really buy into your reasons for saying no.

This is where the option of doing nothing as long as possible really shines: the maxim about a tax delayed being a tax unpaid works here - if it's Unix, and it works today, leaving it alone will pretty much guarantee that it works tomorrow -and, in these kinds of situations, that can be a good thing.

In contrast to opportunity costs, the storage issue is dead simple: those 73GB disks in the 490 can be upgraded to 146GB at minor cost, but going beyond that means either getting an external JBOD or trading off significant new costs against performance. Either way, once volumes get much past 4 x 146GB, the fact is that new gear with terabyte disk sets and full warranties usually combine lower cost with lower risk and higher performance relative to adding disk to old systems.

And, finally, there are staffing issues. People will tell you that switching from Solaris to Linux will make it easier to find qualified staff, but that isn't true. Unix skills are usually easily transferable: if your current staff can keep their hands off that 490 running Solaris, they can probably keep their hands off a Linux replacement machine too -and, similarly, if you can hire someone who can get Linux set up and running properly, the chances are that Solaris won't give them any trouble either.

Conversely, if your staff reports that 490 as unreliable, the one thing you can be assured of is that they're causing those failures -and not only will they do the same thing to a Linux replacement, but whatever root cause (usually a manager whose skillset doesn't match the technology) is driving this will also limit your ability to retain any new people you bring in with better skills.

Thus the positive bottom line on staffing is that if your shop is working well, there won't be anything scary about transitioning between Linux and Solaris - in either direction.

Conversely, if what you've got is a skills-technology mismatch you have two choices: change the people, or change the technology - and do it before you change anything else because not facing up to the issue condemns you to a long and slow death by a thousand failures.

So what's the real bottom line on all of this? Support costs may be a lever for getting people thinking about change, and technology continuation may have value for you, but in the end these kinds of decisions almost always come down to intangibles: guesses about future risks and opportunities, not the small dollars involved in support contracts.

Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.