Draft Blog Entries

% fortune -ae paul murphy

Hardware comparability in benchmarking

Last week's discussions under my "purloined benchmark" title included this bit from ShadeTree:

if you are comparing OS performance....
... and you are not using the same hardware You cannot tell if the performance is related to the software or is merely a reflection of better hardware. I proposed you choose the hardware and run a benchmark. I suspect that the results will be the same regardless of the hardware chosen. As I have stated before I do have a background in Unix and in fact started there. I am also involved in the deployment of Linux. I have also stated that linux has a role in IT albeit not on the desktop. As for the configuration in the benchmark I believe it to be typical of many small business systems. I don't think it was biased in any way.

Quite a lot of people believe this - basically that you can only test which of two or more OSes better supports an application by running the application on identical hardware under each OS. In my opinion, however, the core hardware should be the same, but the configuration should be adapted to show each OS at its best.

The specific configuration I object to in the Pedigo benchmark is this:

For the purposes of these tests, it was important that the servers for both the Linux and Windows clusters be configured as identically as possible. The following configuration options were chosen for the RAC clusters:

Four HPŽ ProLiant DL380 G4
Two Windows servers

RacBench1
RacBench2

Two Linux servers

racbench3
racbench4

Two Intel EM64T Xeon processors per server (4 logical processors with Hyperthreading), 3.4 GHz
Two 36 GB SCSI disks per server, configured as RAID 1
8 GB RAM per server
8 GB swap space/paging file per server
Two Gigabit NICs per server
One Qlogic 2340-E HBA per server

Suppose I were to argue, in the context of the workload selected and the metric applied, that this configuration should be changed on all four servers by:

adding another 8GB to each;
adding two more storage connectors to each;
replacing both 1Gb NICs with Neptune 10Gb cards.
I know this card wouldn't work on the 2004 Linux release used in the benchmark, but lets pretend - and if you don't like that just pick the fastest single card supported in that release.
adjusting Oracle's set-up to separate logging and sort I/O from regular I/O while setting the SGA limits and block sizes to their respective maximums for this Linux configuration.

Now if we were to run the same benchmark workload, and apply the same metrics, Linux would win easily because:

a Windows server application that will just barely fit in 8GB will run measurably more slowly in 16GB - but the paging delay encountered on Linux at 8GB will simply disappear.
the use of only one network card will be a performance killer for Windows but lead to a slight throughput improvement for Linux;
the Windows preference for mirroring the two internal drives is now effectively effectless for Linux - since almost all of the paging has disappeared; and,
the larger Oracle limits will have more effect on Linux than on Windows.

Notice, however, that the configurations are still identical - and so, according to ShadeTree and many others, the comparison is still completely fair.

In reality, however, it isn't - because identical is unfair and misleading if the configuration markedly favors one OS over another.

The fair thing to do, therefore, is to have competing teams of experts configure comparable base machines to reflect both the OS of their choice and the application - because, bottom line, Linux isn't Windows and it doesn't use identical hardware identically.

Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.