Saturday, April 17, 2021

Data Deduplication: Two Methods

Data Deduplicationsounds like an excellent candidate for an HGTV reality show — companies drowning in a sea of redundant data receive a visit from two perky IT people who descend on their files like vultures on road kill with the promise of curing the company’s duplicate data ills with a weekend and few thousand dollars worth of software.

I agree that it doesn’t sound like the best premise for a new show, but if you’re looking for a new money-saving topic to discuss at the conference table next week, toss out the concept of data deduplication. Yes, it’s a mouthful to say but that mouthful might save you a handful — a handful of dollars, that is.

The data deduplication process involves removing copies of files and replacing those duplicates with pointers back to the original copy. Removing multiple copies frees up valuable storage space, makes backups smaller and faster, and reduces network traffic for over-the-network backups. Add the three together and you have significant money savings.

Usually the term “deduplication,” refers to enterprise storage systems that house huge amounts of data harboring perhaps tens of thousands of duplicated files. The sheer number of files and possible copies of those files makes the task seem overwhelming, but fortunately, there is hope in the form of sophisticated software designed for this purpose.

A Tale of Two Methods

There are two types of data deduplication: source and target. Source-based deduplicationtakes place as the backup software processes the files prior to transfer to media. This means the deduplication software replaces your current backup software and strategy with one that examines file contents on the fly. As you might expect, source deduplication speeds aren’t stellar (though still better than tape), but savings come in the form of less network bandwidth being consumed, due to fewer files being transferred, and reduced space on backup media.

Read the rest at ServerWatch.

Similar articles

Latest Articles

IT Planning During a...

Without a doubt, 2020 changed everything. I like to compare it to a science fiction movie where time travel is involved. Clearly, we have...

Best Data Quality Tools...

Data quality is a critical issue in today’s data centers. The complexity of the Cloud continues to grow, leading to an increasing need for...

NVIDIA’s New Grace ARM/GPU...

This week is NVIDIA’s GTC, or GPU Technology Conference, and they likely should have changed the name to ATC because this year – it...

What is Data Segmentation?

Definition of Data Segmentation Data segmentation is the process of grouping your data into at least two subsets, although more separations may be necessary on...