Sunday, April 18, 2021

Planning for Disaster Recovery for Big Archives

Data storage expert Henry Newman details the unique problems in planning for disaster recovery for large archives — a growing concern, given the exponential growth in today’s data archives.

Disaster recovery (DR) is often discussed in broad terms throughout the storage industry, but in this article I will explore a specific segment of the overall market: DR planning for large archives.

This is the first article in a two-part series on archives. The next article will cover architectural planning for large archives.

First, what are my definitions of an archive, and what is a large archive?An archive is a repository of information that is saved, but most of the information is infrequently accessed.

The definitions of archives have changed recently. Just three or four years ago, archives were always on tape, with only a small disk cache (usually less than 5% of the total capacity). The software to manage data on tape and/or disk is called hierarchical storage management (HSM) and was developed for mainframes more than 35 years ago.

Today we have large disk-based archives that back up data over networks. For example, both my work PC and home PCs are backed up via the internet, and large cloud-based archives are common today. There is of course a question of reliability (see “Cloud Storage Will Be Limited By Drive Reliability, Bandwidth”), but that is a different topic.

My definition of a large archive is fairly simple: anything over 2,000 SATA disk drives. Today, that is about 4 PB, and next year it will likely be 8PB when drive capacities increase.I am using 2,000 drives for the archive size given the expected failure rate of the 2,000 drives. Even in a RAID-6 configuration which would require 2,400 drives it will be challenging given the rebuild time to manage that many drives for a single application.

Read the rest at Enterprise Storage Forum.

Similar articles

Latest Articles

IT Planning During a...

Without a doubt, 2020 changed everything. I like to compare it to a science fiction movie where time travel is involved. Clearly, we have...

Best Data Quality Tools...

Data quality is a critical issue in today’s data centers. The complexity of the Cloud continues to grow, leading to an increasing need for...

NVIDIA’s New Grace ARM/GPU...

This week is NVIDIA’s GTC, or GPU Technology Conference, and they likely should have changed the name to ATC because this year – it...

What is Data Segmentation?

Definition of Data Segmentation Data segmentation is the process of grouping your data into at least two subsets, although more separations may be necessary on...