A couple of weeks ago Science Daily had a headline worthy of serious admiration: United States Death Map Revealed. Here's the introduction:
A map of natural hazard mortality in the United States has been produced. The map gives a county-level representation of the likelihood of dying as the result of natural events such as floods, earthquakes or extreme weather.
Susan Cutter and Kevin Borden, from the University of South Carolina, Columbia, used nationwide data going back to 1970 to create their map. According to Cutter, "This work will enable research and emergency management practitioners to examine hazard deaths through a geographic lens. Using this as a tool to identify areas with higher than average hazard deaths can justify allocation of resources to these areas with the goal of reducing loss of life".
And here's the paragraph containing the bit I thought applicable to how data centers are managed:
Heat/drought ranked highest among the hazard categories, causing 19.6% of total deaths, closely followed by severe summer weather (18.8%) and winter weather (18.1%). Geophysical events (such as earthquakes), wildfires, and hurricanes were responsible for less than 5% of total hazard deaths combined. Cutter said, "What is noteworthy here is that over time, highly destructive, highly publicized, often catastrophic singular events such as hurricanes and earthquakes are responsible for relatively few deaths when compared to the more frequent, less catastrophic events such as heat waves and severe weather (summer or winter)".
This kind of perceptional misjudgment comes from two main interacting causes:
In Canada, for example, it's an annual tradition for our national television networks to have reporters fly into Saskatchewan, pan a camera around a dried out mud flat south of Regina, and gravely report some conservative's personal responsibility for a crop crippling worldwide drought.
In that context, therefore, consider this rephrasing of the Science Daily writer's key paragraph - keeping his (wholly inapplicable) numbers:
Network software failures ranked highest among the hazard categories, causing 19.6% of total client-crashes closely followed by client software failures (18.8%) and server software failures (18.1%). External events such as virus storms, denial of service waves, and worm wildfires were responsible for less than 5% of total system wide failures combined. Cutter said, "What is noteworthy here is that over time, highly destructive, highly publicized, often catastrophic singular events such as network worms and phishing attacks are responsible for relatively few failures when compared to the more frequent, less catastrophic events such as software failure and administrator error."
Fundamentally, our intuitive understanding of what's really going on is vastly distorted by the relative publicity accorded a few large scale events - the saddest examples I know of being that the PC industry has yet to notice that the billions being spent on "security" are driving attack profitability; that most PC network operators don't know their networks are failing because user reboots hide the problem from help desk operators; and that the "big bang" impact of mid nineties and earlier Unix server pricing still overshadows the reality that any machine capable of running a licensed Windows OS can also run a free, and more efficient, Unix.
Thus the basic lesson from the death map should be clear: before making any decision, consider the perceptional basis for the assumptions behind your choices.