Sunday, May 19, 2024

The Data Deduplication Market

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

The data deduplication market addresses two primary needs — reducing storage space for data backups and cleaning up databases. Although radically different in execution and motivation, the goals for each need are to reduce costs and increase efficiency.

Data deduplication solutions vary greatly from local hardware solutions for enterprise backup to cloud-based database cleanup solutions targeted at specific applications. The demand for these products remains global and continues to grow.

See below to learn all about the data deduplication market:

The Market for Data Deduplication

Analysts project the global market for data deduplication to grow at a compound annual growth rate (CAGR) of 8.5% to 27.8% and reach a market size of $8.92 billion in 2025 or as high as $353 billion in 2028.

Backup data deduplication dominates the current market and consists of both hardware and software solutions. Hardware solutions incorporate deduplication software into the racks of storage media to reduce the data to be copied and stored with each backup. Software solutions operate more flexibly and can also be applied to deduplicate backup data destined for cloud storage.

Some solutions blend hardware and cloud solutions as well as vendors. For example, FalconStor Software combined their hybrid cloud software solutions with hardware flash-drive arrays from Hitachi Vantara to create a high-speed data recovery solution. As the market matures and customer needs become more sophisticated, we can expect the lines between categories to become blurred.

Although a smaller part of the market, database cleanup is quickly growing in importance. With the rising significance of data mining, databases need to be in good condition to reduce processing times and to avoid biased artificial intelligence (AI) algorithms. Database cleanup software deduplicates the data from databases, reducing their size and improving the quality of the data.

Due to the high initial investment costs for manufacturing, the hardware market tends to be dominated by a small number of large companies, such as IBM, Barracuda, Exagrid, Dell Technologies, Veritas, and others.

The software market can be easier to enter and has many more competitors. While there are some larger players, such as Validity and DQ Global, there are also many software specialists, offering solutions to deduplicate specific applications, such as Salesforce.

Data Deduplication Features

Data deduplication compares new data against existing data and discards duplicates. Data deduplication techniques use various hashing algorithms (MD5, SHA-256, etc.) to reduce entire hard drives or single database entries into a single number. New data is hashed by the same algorithm, and if there is any difference in the data, the hash values will not be the same, and the data will not be discarded.

For data backup solutions, the huge data sizes may lead to enormous tables of hash values which will drag down performance. Some solutions break the data into larger data chunks to create smaller hash lists and improve performance for hash-value comparisons. Other solutions may instead break the data into smaller pieces to improve accuracy and ability to recover data more granularly. Each option delivers a specific advantage for specific customer needs.

For database deduplication, the hashing algorithm may ignore capital letters or use fuzzy logic, so it can recognize similar entries as identical. For example, a marketing database would want to create a single entry out of addresses that are the same but are spelled out or capitalized differently.

Other options allow deduplication between different database fields, such as the fields phone number and cellular phone number. Custom rules may also be allowed where the client will create their own specific deduplication criteria. Software solutions can be deployed locally, on the cloud, or even within software-as-a-service (SaaS) cloud applications.

Data Deduplication Benefits

Data backups can require the same data to be copied repeatedly several times a week or even per day. These backups begin to consume enormous space, and copying the data can be very time consuming. Additionally, if an incident occurs, digging through backups to locate and extract data for restoration can also be time consuming.

Using data deduplication for backups can result in huge operational and cost savings:

  • Extend the life of hard drives by reducing the number of required read/write processes
  • Increase the frequency or number of backups possible with the existing resources
  • Improve speed and reliability and simplify the process for data recovery
  • Reduce storage space, bandwidth, and costs for existing backup processes
  • Speed up backups
  • Store more data in existing storage solutions

Database deduplication enjoys similar benefits:

  • Eliminates internal conflicts or embarrassment caused by data overlap
  • Improve reporting clarity and accuracy
  • Reduces the size and increases the efficiencies of databases

Data Deduplication Use Cases

Companies often adopt data deduplication to accomplish a specific goal such as reduced time for data recovery or reduced storage costs. However, they often discover that solving one problem also provides many side benefits.

Data integrity, backup speeds, network bandwidth for backups, data recovery time, storage costs, and IT administration all directly relate to the amount of data. Using deduplication reduces the data and proportionally reduces all associated costs.

Ahearn & Soper

The barcode software and hardware producer, Ahearn & Soper (A&S) found that their existing tape backup system could only retain a few months of data, so they sought to upgrade. The switch to ExaGrid with data deduplication not only extended available data recovery points from two months to two years worth of data, they also recognized many other unexpected benefits.

A&E reduced a two- to three-hour data recovery time to a few minutes and reduced their IT processes for backups dramatically. However, once they virtualized their data center and networks, they recognized even more savings.

“After improving our networking and our data center systems, the efficiency has gone up tenfold,” says William Rosenblath, an IT professional at A&E.

“We used to have a goal of just getting our daily [incremental backups] done overnight, and now they’re usually completed within an hour or two. And now we’re backing up system images instead of files.”

KCC Engineering & Construction

KCC Engineering & Construction Co. (KCC) backed up 10 TB of data to tape, but both backups and recoveries from tape media took too long to be useful.

“We benchmarked the Veritas NetBackup [hard drive] appliance for one month and got higher than expected deduplication and speed,” says Kilho Lee, manager of KCC’s IT team.

“A full backup of 1.5 terabytes worth of files … took three hours … Then, we used the Veritas NetBackup Accelerator [using data deduplication], and it took approximately 20 minutes — almost 10 times faster.”

Deduplication reduced data by 90%, and now each backup of 20 TB of company data only consumes 2 TB of space. Backup time has also been reduced 90%, and weekly backup IT processes have been reduced 50%. The project’s expected payback period is four years.

Texas Christian University

Texas Christian University (TCU) could not meet their backup recovery objective of four hours with existing solutions. So TCU decided to switch to a hybrid backup solution combining Dell Technologies on-premises appliances and cloud resources (Dell and Azure). The solution was accelerated using data deduplication to reduce bandwidth and storage requirements.

“Dell EMC has saved us three times over what we would have spent on data protection storage if we had continued without a viable deduplication solution,” says Craig Carlson, associate director of computer systems at TCU.

With average deduplication ratios of 78 to 1, network and backup storage is reduced 99.34%, and backup times have been reduced dramatically.

Data Deduplication Providers

Some of the leading providers of data deduplication solutions include:

  • Barracuda Networks
  • Dell Technologies
  • DQ Global
  • ExaGrid
  • FalconStor 
  • Fujitsu
  • Hitachi 
  • IBM
  • Microsoft
  • Nexsan 
  • OpenDedup
  • StrategicDB
  • Validity
  • Veritas Technologies

Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Latest Articles