Disaster planning has long been treated as a sideline project for IT departments, not worthy of serious focus or budgetary concern.
Then came Sept. 11, and in its wake the need to prepare for catastrophe has become a front-and-center necessity for businesses big and small. Indeed, business continuity planning has become an industry unto itself, spawning a new and revised view of what it means to prepare for a disaster.
Today, the idea that a disaster will strike has shifted from a possibility to a probability in the minds of corporate officials, forcing technology staffs to realize that downtime for any reason is unacceptable.
Highlighting this is the attention CFOs and CEOs are now placing on Information Technology (IT) recourses. Recovery Time Objectives (the amount of time it takes to bounce back from a disaster to full productivity) and Recovery Point Objectives (the amount of acceptable data loss after a disaster) have shrunk to the point that tape backup systems can no longer protect the enterprise.
Companies are now expected to ensure Disaster Prevention, part of business continuity and the processes by which vital data resources are protected and operate uninterrupted regardless. This is the proverbial “five nines” of reliability and uptime (99.999% availability), resulting in a maximum of a few minutes of unscheduled downtime annually — a lofty goal, but a difficult one to meet in the best of circumstances.
How did business reach this level of necessity? Where did the idea of business continuity evolve from, and how did we reach this new plateau of standards to keep business up and running? What circumstances brought the enterprise from veritable carelessness only a few short years ago to the vital vigilance that we see today? While Sept. 11 may have shone a spotlight on this necessity, the evolution of data protection systems has been an ongoing process since the dawn of Information Technology.
New Technologies, New Problems
Prior to the dot-com craze, enterprises kept data either on paper or some other physical media (i.e. hardcopy, punch-cards, etc). Not being stored in digital form, theoretically, business data could survive any digital disaster.
However, it still was susceptible to physical disasters such as an earthquake or fire, so to combat these issues companies began storing physical copies off-site in repository or secure facility. (This is a critical concept, as this theory of off-site storage later crosses over into the digital world in a nearly identical form.)
In addition, mainframe systems of the time were backed up to heavily protected magnetic tape. The data kept on them was generally used in conjunction with physical hardcopy, so that a loss of the system for a day to restore data wouldn’t bring down the enterprise.
Then came the widespread deployment of desktop computers, and data was no longer safe because employees were not storing vital corporate data on the company mainframe or in physical files. Suddenly, power fluctuations, physical anomalies, and a host of other disasters could literally wipe out valuable data without any potential of restoring it.
Recognizing the risk this presented, backup systems and office-based servers emerged to begin to protect corporate data on PCs in the same way that paper files and mainframes magnetic tapes were protected.
Moving forward, smaller server systems, which offered a more flexible and economical alternative to the older, slower mainframe systems, were implemented. Not surprisingly, mainframe systems began to dwindle and disappear, even in the enterprise space.
This new computing power presented a whole new host of potential problems, not the least of which was data loss. Once again there were worries about the desktops getting vital data to the server systems and the fact that the server systems themselves were barely more secure than the desktop.
The newest solution became constructing more complex backup systems to shift the data from the volatile servers to a somewhat more stable backup media, such as magnetic tape. Having tapes to restore lost data or even entire data-systems in the event of a disaster seemed like the perfect solution.
Downtime Drains Bottom Line
But companies became more and more dependent on data-systems and it became apparent that waiting to restore data caused not only revenue loss, but also damaged customer relationships and the company’s reputation. To address the need for virtually no downtime, disaster recovery services were born.
The field of Disaster Recovery Services (DRS) is extraordinarily broad, but its purpose is simple: Restore data to a downed or corrupted server system or other data-system as quickly as possible. DRS extends from re-configuring tape systems to make them more reliable and faster to keeping duplicate servers on standby to allow them to take over at a moment’s notice.
Disaster Prevention in action, that’s what Business Continuity Planning (BCP) really is. The science of determining ways to allow data-systems to continue working, even if an entire physical location is downed or destroyed, with the baseline idea being that data-systems are portable objects.
This is a fundamental shift in thinking from the days of mainframe-based enterprise computing, where the system was the hardware for the most part. Today the level of operating systems and software packages are considerably more important than hardware, as they determine the level of power of the system itself. Once business IT staff accepted this, the doors of BCP were flung wide open and the advent of the distributed data-system emerged.
No longer was the corporate data-system at the mercy of a single point of failure. The entire data-center could be grouped, clustered, and manipulated as a single entity to protect the data of the enterprise. For example, IT staffs formed load-balanced web-sites with groups of servers that could all share the load of a single or even multiple downed machines and created e-mail server groups that spanned the country, each one able to hold messages for an offline counterpart.
But, as with any great plan, there was one significant flaw — the data-center could be the single point of failure. As we have seen recently in California, even the most redundant data-center can fall victim to a power grid failure, and when the diesel backup generators finally run out of power, the data-center, and corporate data, go offline. BCP had come a long way, but still had a long way to go before achieving the mythical “five nines.”
Transcending Physical Boundaries
Stepping up, mega-storage companies like EMC produced storage systems that could replicate themselves to other data-centers, not located in the same physical vicinity — a theory similar to that of what businesses used years before to protect physical data like punch-cards and hardcopy. Replicating meant the entire body of corporate data could be kept up to date in some other location, thereby protecting against the possibility of failure due to the loss of a physical location.
Initially it seemed like an ideal solution, but it was really only a reversion to Disaster Recovery, just on a much larger scale. The data was safe in a secondary location, but inaccessible because the servers were still located and attached to the primary storage device. So, until the primary location could be brought back online, the data was unavailable and the company was losing revenue.
Clustering, while a good solution for single-site High Availability, could not be stretched to multiple sites and therefore couldn’t protect data in the event of the loss of any physical location. Another leap forward was required to fully address the situation.
Expanding on data replication, companies like NSI Software began to develop BCP software that allowed both the data-system and data to survive a physical site failure by transcending physical boundaries. Enabling the enterprise to eliminate the single point of failure in a cost-effective manor, real-time replication software products have made ensuring business continuity an attainable feat for businesses of all sizes.
With replication software, clusters no longer needed to be physically connected to a shared storage array, and with High Availability systems, stand-alone machines could stand in for each other no matter where they were physically located.
In addition, by utilizing platform and storage independent data structures, these replication products allowed IT staff to create duplicate hardware and software configurations in multiple physical locations that could share data and keep each other up-to-date. Now, systems could stand in for each other seamlessly on a moment’s notice without end-users having to perform any tasks or even noticing the change. Essentially, the end user can continue to work, uninterrupted, while the data-systems handle all the tasks of taking over the data-processing load for their downed counterparts.
Examples of the value and capabilities of replication software are easy to see. Failure of an Exchange e-mail system in New York City can now seamlessly switch to a physical system in Detroit, without the CEO (or anyone else) missing a single message. Knowing the information is available, the IT staff can then correct the issues in NYC and fail-back the physical systems to restore them to their original state, without the pressure and rushing that often causes even more damaging mistakes than the original outage.
Finally, the goal of achieving the “five nines” can now be met, signaling the conclusion of a monumental paradigm shift from keeping everything on physical media that could be duplicated off-site to a digital world of self-healing data-systems that create the truly digital, always on enterprise.
Mike Talon has been working in the Information Technologies field for more than 10 years, specializing in data protection and disaster prevention. He currently works for NSI Software, a developer of data replication technologies and services. He can be reached at firstname.lastname@example.org.