Tuesday, March 19, 2024

Data Deduplication: Two Methods

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Data Deduplicationsounds like an excellent candidate for an HGTV reality show — companies drowning in a sea of redundant data receive a visit from two perky IT people who descend on their files like vultures on road kill with the promise of curing the company’s duplicate data ills with a weekend and few thousand dollars worth of software.

I agree that it doesn’t sound like the best premise for a new show, but if you’re looking for a new money-saving topic to discuss at the conference table next week, toss out the concept of data deduplication. Yes, it’s a mouthful to say but that mouthful might save you a handful — a handful of dollars, that is.

The data deduplication process involves removing copies of files and replacing those duplicates with pointers back to the original copy. Removing multiple copies frees up valuable storage space, makes backups smaller and faster, and reduces network traffic for over-the-network backups. Add the three together and you have significant money savings.

Usually the term “deduplication,” refers to enterprise storage systems that house huge amounts of data harboring perhaps tens of thousands of duplicated files. The sheer number of files and possible copies of those files makes the task seem overwhelming, but fortunately, there is hope in the form of sophisticated software designed for this purpose.

A Tale of Two Methods

There are two types of data deduplication: source and target. Source-based deduplicationtakes place as the backup software processes the files prior to transfer to media. This means the deduplication software replaces your current backup software and strategy with one that examines file contents on the fly. As you might expect, source deduplication speeds aren’t stellar (though still better than tape), but savings come in the form of less network bandwidth being consumed, due to fewer files being transferred, and reduced space on backup media.

Read the rest at ServerWatch.

Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Latest Articles