Facing what isn't your fault

By Paul Murphy, author of The Unix Guide to Defenestration

Ever been hauled into a room filled with hostile faces and told that your system isn't working? I have, and I'll bet you have too.

Retorting hotly either that it %$% well is, or that the ICs (idiots in control) picked and configured the hardware, is kind of career limiting. The right thing to do is to feel their pain - and recognize that it's usually their system, not yours, that isn't working.

Nine times out of ten the problem is outside your scope of control but you may still be able to offer more positive help than merely sharing their frustration. In most cases the underlying villain is complexity - whether that's five layers of switches between user PCs and your server, a SAN implementation, the result of help from an outside Oracle expert, or a detour through NT for every user query to your database, doesn't matter. Your challenge is to help by finding acceptable simplifications.

The single most successful thing I've done in these circumstances is sell the idea of a small test machine right in the user area as a means of finding out what the problems really are. Data center management always objects, but an older machine borrowed from Sun can seem non-threatening and so, if users are angry enough, you may find yourself trying to show that a small machine in the user area can outperform the big machine in the data center.

For database applications the keys to success are:

  1. max out on memory - remember, you can get 2GB of Sun RAM for the price of one day's tuning expertise from a consultant - and you get to keep the memory.

  2. use locally connected raw devices on dual scsi controllers with database, not OS, mirroring; and,

  3. get that machine on the local subnet so no packets go through the switch layers.

Properly set-up, I've seen Sybase and Oracle on Sun 250s seem to massively outperform the same products on pSeries machines in the data center. When you run big SQL jobs the pSeries machines are faster, but when users access the local box they get instant response.

As I said, the key issue is usually complexity. Hardware by itself isn't the answer for this, but its a lot cheaper than people time and you can usually get it in place a lot quicker than you can get procedures changed - and once users see that you're on their side and can deliver results, they'll give you glowing recommendations as you look for a new job.