Saturday, April 20, 2024

How Open Source Could Drive a Tape Storage Comeback

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

The death of tape storage has been loudly proclaimed for what seems like decades now. But a recent breakthrough in tape management could breathe new life into the long-running technology and change archiving and even long-term backup as we know it.

The breakthrough is a technology called the Linear Tape File System (LTFS), which was developed and then open sourced by IBM. LTFS is a self-describing tape format that allows tapes to be written in a common format so that they can be read anywhere without any application other than the open source LTFS. It also indexes tapes and makes them searchable so the files you need can be retrieved much faster. The specification can be found here (downloads as a PDF).

Think about what the end of proprietary tape formats could mean for tape. You could ship a box full of a hundred LTO-5 tapes (150 TB) from Maine to California overnight, put them in your library and read them into your system without needing an application to ingest them. You are not going to get close to moving that much data even with an OC-768 channel, and even at that speed with a dedicated network, you are talking about days.

You could purchase an application on both ends that allows you to know what is on the tape, but this does not provide you with the open format LTFS provides that would allow you to ingest the data on the other side.

I think LTFS will matter the most for archiving, even though for backup you wouldn’t need to worry if formats change and you want to read a 20-year-old tape.

Today most archival formats from HSMs (hierarchical storage management systems) are proprietary, and even if they are not they use something like tar or gtar. With LTFS you can easily determine all the files on a tape without knowing anything about the tape; of course, you’d still need to know how to interpret the files.

LTFS is new and there are only a few storage products that are using this technology, but the list is growing and support among the hardware vendor community is growing too, so more products will be coming along soon.

From what I can tell, LTFS may come with some limitations and work for the applications writing the tapes. I see a couple things that should be asked of your LTFS vendor:

1) As LTFS needs to write files, are your file sizes small enough to fit on a single tape? Right now for LTO-5, that number is about 1.5 TB uncompressed. In some applications this is an issue because the files exceed this size; a good example of this is the oil exploration industry. What needs to happen in this case is that the user needs to break the file up into chunks. Maybe some LTFS applications do this for you, but then you need that application to stitch the files back together.

2) The second issue that I see is small file aggregation. Given how tapes work, you need to write large amounts of data. If you only have small files, backup and archive applications aggregate the files so that you have larger blocks to read or write. With LTFS, as each file is known, applications that write these small files need to somehow write large numbers at a time. This is an important question to ask your application vendor.

Though LTFS is going to be a breath of fresh air for the archive market, and I hope someday for the long-term backup market, not all LTFS applications are going to solve every problem. Though every LTFS written tape could be read on any system anywhere, the vendors that develop applications to manage the files on the tape and write data are likely to be very different. Policies for keeping certain files together may or may not be important to you. For example, in the medical industry, keeping all the files from a patient on a tape might be important. You might have to save space on that tape for new records for future doctor and hospital visits. That is just one example of where policy management of the LTFS files will be important.

I believe that LTFS will change the way tapes are viewed and allow many more people to use tape. Today there are some 20 or 30 tape backup applications with various formats, some of them historical, and we have many archive applications with proprietary formats. This vendor lock-in for tapes, in my opinion, has been detrimental for tape because you need to know what the application is that wrote the tape and make sure that you have that application.

I know some people who make a living decoding tapes. They have what I call a Noah’s Arc of applications and systems to read tapes, with versions of applications from the early 1990s to today. Keeping that hardware up and running is no easy task. This is why LTFS is a breakthrough that should be embraced by the storage community.

I have said for many years that tape is not dead — despite loud claims to the contrary; now tape has a common interface and interchange for files similar to what is seen on disk drives, plus the added advantage of lower cost, portability and long-term storage. Maybe it is disk systems that should be worried this time.

Henry Newman, CTO of Instrumental Inc., has worked with high-performance computing and storage systems for 30 years.

Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Latest Articles