This is the 14th excerpt from the first book in the Defen series: The Board Member's IT Brief.
This section is concerned with things you should talk to your CIO about - informally, but with attention.
Basically what you want to know is: what happens to the business if your primary data center, along with the people who work there, suffers a disaster - floods, fires, terrorism, or legionnaire's makes little difference.
The right answers are different between the Unix and Mainframe/Windows CIOs, but both should be concerned about the same three things:
If data center A gets blown up, flooded, or otherwise shut down, the people at data center B should be unaffected and fully able to take over the workload.
Thus an answer that nods to redundancy through some backup data center but relies on the same people who work at the primary site is, if your organization is any reasonable size at all - say 100 or more total staff- incomplete at best and more hope than plan at worst.
Both the original data processing environment and its modern descendent, the tightly locked down client-server operation, depend on management to co-ordinate the activities of large numbers of people.
As a result it's usually not practical to maintain two fully staffed data centers unless your organization's operations naturally lend themselves to this through size, geographic dispersion or other external circumstance.
Notice that many CIOs will tell you that hardware failure is a frequent occurance against which their clustering technology provides full protection. This is true - having a server die in a room full of rack mounts stuffed with them should mean nothing, but clustering offers no continuity protection if the building blows up or the clustering expert gets hit by a bus.
If you do have two centers, great. With two, your CIO needs to:
In the more common case, however, you cannot afford duplicate processes and face practical constraints on cross training or otherwise practicing disaster recovery. In this situation what you need starts with a contractually committed backup site and well developed plans with carefully defined checklists of core activities, responsibilities, backup communications channels, and - most importantly - a clearly defined succession in control for all major functions.
Your CIO should be aware, furthermore, that such plans essentially never survive the first hour or two after the disaster strikes. People don't act according to plan; communication almost always fails; the critical license or permission will turn out to pertain only to the destroyed machine; the designated successor to a staffer put out of action by the emergency will turn out to be on stress leave; data backups will turn out to be badly out of date or unreadable; and the custom applications carefully stored at the backup site will inevitably turn out to be missing some critical patches your business people rely on -and whose absence in the backup makes them inoperable with the updated database structure from the production site.
Your CIO has to have a detailed disaster recovery plan, but the key issue here is realism - does your CIO recognize that Murphy's law goes into over-drive when a disaster happens: that tired people will make every possible mistake -usually twice- and that there's a near absolute inevitability about finding something - like a failed database patch - that stops every attempt to restore normal operations?
|Systems Integrity is like a chain - it breaks at the weakest point|
A pair of 2003 "Dear Member:" emails from CIPS (the Canadian information Processing Society - an organization from
which I had long since resigned in protest over their Windows only website and addiction to hiding important email
in floods of junk ) illustrates many typical Microsoft environment management problems:
In the early hours of May 8, 2003 there was a break in and entry at the CIPS National Office. Two servers and one computer system were stolen.
The second one said (among other things):
After further review I am now in a position to verify with you that the on-line membership renewal process is a secured process. Any credit card information provided is encrypted. This is different from what was reported yesterday.
My bet? that the transactions were only encrypted during communication, and that everything needed to access the SQL-Server database was stored on the desktop machine.
He should also be aware, and therefore, make you aware, that users will have bypassed Systems controls in at least some areas - and every one of those will haunt the organization during a recovery effort.
Thus the CIO may think all key data is stored on his servers, but it won't be true. He may think he's aware of all legal commitments (things like loans secured by changing inventory) that require access to your information systems - but he'll be wrong about that too.
Nobody can do anything about problems like these until they come up, but someone who doesn't know that they will come up, is dangerously naive.