Sunday, June 16, 2024

Data Protection Complexity Grows Exponentially

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Once upon a time we lived in simpler data protection times. You popped in a backup tape, made a few mouse clicks and you were done. You slept well at night knowing that backup tape could be used to recover data in the event of a disaster. Then one day you woke up and realized that just wasn’t going to be good enough any more.

Since then, you don’t sleep so soundly. You have to encrypt the data, send tapes offsite, install backup appliances, transmit to the cloud, figure out ways to get information to and from the cloud faster (such as WAN optimization), deduplicate and compress, surround your data with dozens of security safeguards to prevent wrong doers from stealing or corrupting it, institute complex tiering architectures, and add in an archiving layer so data can be retained for prescribed periods.

That same level of complexity was very apparent when Datamation visited EMC World last week. The company addressed data protection from all angles and via various spokespeople. They each had a different take on the area and in many cases, a different product to showcase. Even the snowstorm of press releases announced at the show in seemingly unrelated areas often included a data protection element.

So what is going on with data protection?

Data Protection in the Cloud

Russ Stockdale, Vice President EMC Protection Cloud (formerly Mozy backup), is all about extending data protection to the cloud. He made the case for the introduction of cloud tiering as a way of storing older data cost effectively. Why use on-site disk to store data you rarely access when you can do so much more cheaply elsewhere? Mozy, he said, is one option for this. And whereas the company earned its stripes in the consumer market, Stockdale said 75% of its revenue today is from businesses.

The speed of getting data in and out of the cloud, of course, is a common concern. EMC has been addressing this along many vectors. CloudBoost includes Boost technology developed for Data Domain to speed the flow of data by distributing parts of the deduplication process to the backup server or application client, thereby improving backup performance. In essence, it compresses and dedupes data before it is sent to the cloud. This is no small matter – who wants to be paying a cloud provider for the storage of 500 copies of the CEO’s PowerPoint or keynote video?

“Duplication of data sent to the cloud is an emerging issue,” said Stockdale.

EMC is also talking up Spanning as a completely different aspect of storage, one that few would consider. What it does as act as a backup for Google Apps, Salesforce and now Microsoft Office365.

“Those products don’t have undelete, so when someone deletes something, it is gone forever,” said Stockdale.

The result is that there is little recourse for data stored in these apps that is either accidentally or maliciously deleted, or perhaps lost due to synchronization errors. Contact some of these vendors to solve the issue and they might either a) point to the small print in the contract or b) offer a consultation service for $10,000 or $20,000 – which might or might not get you the file after many days.

Spanning, therefore, provides a second copy of data that is being stored in Google Apps, Office365 or Salesforce. Its Application Programming Interface (API) hooks into these apps to enable it to make a copy of anything they save.

“Mozy is backup for on-premise data and spanning is back up for cloud app data,” said Stockdale.  

EMC is packaging many of these elements into the EMC Data Protection Suite. It includes backup/deduplication software such as Avamar and Networker, Boost and a searchable compliance archive.

Archiving Options

Over the last decade, EMC has accumulated a vast array of diverse products. The big plus is that the company covers all the bases. But the minus is that they don’t always integrate well, and that it all gets a little confusing. Case in point: archiving. The company offers several archiving options, primarily SourceOne and InfoArchive.

SourceOne Archiving is available for on-premise and cloud, which archives email and information from messaging systems, file servers, collaboration systems and social media. Deduplication is included.

“SourceOne is for shorter-term retention, not for long term archiving,” said Bryant Bell, EMC InfoArchive Product Marketing. “SourceOne operates more at the infrastructure layer and InfoArchive is more at the application layer.

He said InfoArchive allows a company to keep structured and unstructured data in one repository, as well as providing flexibility in terms of methods of ingest. The software enhances data records with metadata that helps speed its search capabilities.

One of the major features of this tool is that allows data from diverse sources to be archived as one business object that covers a single event. Take the case of a stock market trade. It might have involved a phone record, emails, the transaction itself and other documents. InfoArchive gathers them all as a record of that one event.

“It is easier to set retention periods against a single object,” said Bell. “This makes it simpler for regulators or lawyers to pull the incident rather than having to look in many different repositories.”

He gave the example of Microsoft’s acquisition of Nokia devices. As part of the deal, the company received a number of large databases and Lotus notes archives which contained data needed for patent protection. When all these aging systems and Notes were ingested into InfoArchive, they were combined into related events. This enabled the company to respond to patent infringement demands within a day as opposed to months.

“This reduced legal costs dramatically and eliminated maintenance costs for Lotus and older systems that they thought they would have to maintain to keep these records,” said Bell.

EMC also beefed up its Data Domain appliance line with the new DD9500, which is now its highest end unit. This latest box comes with new software (DDOS 5.6), which extends backup and deduplication protection to Hadoop and NoSQL environments, including Pivotal HD Enterprise Business Data Lake, Cloudera Enterprise Data Hub, and Hortonworks Modern Data Architecture. Further, it boosts performance to 58.7 TB/hour and capacity to 1.728 TB.

“The Data Domain 9500 has 1.5 times the performance and 4 times capacity of its nearest competitor,” said Guy Churchward, President, Core Technologies Division, EMC.

The snowstorm of data protection-related press releases turned into a blizzard with yet another product to showcase. EMC CloudArray software enables tiering to cloud or object storage from the EMC VMAX3 platform. Churchward said its aim was to lower costs by moving data to lower-cost storage. CloudArray can be used for backup and archive, file or secondary data. 

Better Integration

While efforts are ongoing to package this diverse collection of offerings into such wrappings as the EMC Data Protection Suite, it is clear that EMC has an awful lot of tools in this area, either developed in-house of acquired from a diverse collection of startups and developed on a myriad of different platforms. Some of these products play well together, but others not so much.

Churchward noted this, admitting that this was a major area to address.

“We are best of breed in many areas and need to do better at integrating our various products,” he said.

Photo courtesy of Shutterstock.

Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Latest Articles