Data storage expert Henry Newman details the unique problems in planning for disaster recovery for large archives -- a growing concern, given the exponential growth in today's data archives.
Disaster recovery (DR) is often discussed in broad terms throughout the storage industry, but in this article I will explore a specific segment of the overall market: DR planning for large archives.
This is the first article in a two-part series on archives. The next article will cover architectural planning for large archives.
First, what are my definitions of an archive, and what is a large archive? An archive is a repository of information that is saved, but most of the information is infrequently accessed.
The definitions of archives have changed recently. Just three or four years ago, archives were always on tape, with only a small disk cache (usually less than 5% of the total capacity). The software to manage data on tape and/or disk is called hierarchical storage management (HSM) and was developed for mainframes more than 35 years ago.
Today we have large disk-based archives that back up data over networks. For example, both my work PC and home PCs are backed up via the internet, and large cloud-based archives are common today. There is of course a question of reliability (see Cloud Storage Will Be Limited By Drive Reliability, Bandwidth), but that is a different topic.
My definition of a large archive is fairly simple: anything over 2,000 SATA disk drives. Today, that is about 4 PB, and next year it will likely be 8PB when drive capacities increase. I am using 2,000 drives for the archive size given the expected failure rate of the 2,000 drives. Even in a RAID-6 configuration which would require 2,400 drives it will be challenging given the rebuild time to manage that many drives for a single application.
Read the rest at Enterprise Storage Forum.