Over the last few months, the issue of long-term tape archiving has come up in my work several times.
Each time the issue is brought up, I hear from supposedly smart people, ”We will put it on tape and migrate it in 15 years, one-half of the shelf life of the tape.”
The people saying this are in technology-smart positions (VP of Engineering and Technology for Blah Blah, etc.). Often I am hearing this from organizations that have large digital archives or large analog archives that they plan to digitize. The problem is that you can’t always take a tape drive that is 20 years old, with 20-year-old data on it, read the tape and migrate it to a new type of media.
Twenty years ago, the IBM 3480 cartridge was introduced, and if you had written the data from an IBM mainframe running MVS, you would likely still be able to read it today. But what if you had waited two to five years and then written it on a Sun Workstation running Solaris 4.0 or a Cray-XMP, or something else? Do you think you’d be able to read it today? Perhaps, but not very likely, and even if you could read it, how much would it cost to read the Cray-XMP data?
In one sense, I am making a case for mainframes. It might be because mainframe technology, rightly or wrongly, does not change quickly. Change in the mainframe market is evolutionary, not revolutionary. Remember back 20 years ago: you had multiple mainframe vendors besides IBM, including Univac, CDC and others, and the Japanese and Amdahl clones. These vendors have for the most part disappeared, but mainframe technology is backwards-compatible for much, much longer than open systems, be they Unix, Windows or whatever.
There are multiple issues surrounding long-term archival storage that must be considered if you hope to navigate these treacherous waters successfully. Let’s take a closer look at them and outline the issues you need to think about carefully. With more and more regulations requiring that data be retained long-term, the topic is an important one.
Issue 1: Getting the data to the media
I have repeatedly discussed tape bit error rates recovered and un-recovered and various disk bit error rates for SCSI and SATA drives, but what about the network to the disk and/or tape device? What are the bit error rates for the Fibre Channel? The recovered bit error rates at the hardware level are about 10E-12. This is 100,000 times greater than the bit error rate on enterprise tape media. This might be why some new emerging standard for end-to-end data reliability are emerging using an 8-byte checksum from the HBA to the storage device. Look at www.t10.org and end-to-end data protection. We are still a few years away, but this will encompass data checking from the host to the device.