We leverage a grid architecture to provide significant amounts of data reduction with byte-level delta technology, says Dave Therrien, CTO and founder of ExaGrid Systems.
According to Therrien, grid technology can provide many benefits in data storage. In terms of availability, for example, if one or more nodes fail, other nodes in the grid take over the task of delivering data to clients and applications. As resources are virtualized, tens to thousands of storage nodes act as a unified pool. Each storage node can also monitor the health of all other nodes. When one fails, any or all of the other storage nodes can help in the reconstruction of lost data.
Scalability, too, is improved. Every storage node deployed contains one or more terabytes of disk storage capacity, as well as CPU processing power.
This CPU processing power allows advanced data management algorithms to be applied to the data maintained by that storage node, says Therrien.
For instance, byte-level delta data reduction is just one of many applications that can be run within each storage node in the grid to allow each node to perform data reduction on its small subset of data in parallel with all other nodes. There are many other applications that can be run in a grid to improve the performance, reliability, integrity and the security of data.
As a result, grid technology is being applied to the issues related to traditional tape-based backup. By integrating with existing backup applications, for example, users of Veritas BackupExec, NetBackup or CA ARCserve can have their backup data directed to a storage grid instead of backing data up to tape. This is said to provide increased performance reliability of backup data.
Day-to-day and week-to-week backup data has a significant amount of redundancy. With the processing power of a storage grid, this data can be reduced using compression. Similarly, when managing the replication of backups, grids can reduce the amount of data that needs to flow between storage grids at two or more sites across bandwidth-limited WAN links.
Therrien reports that many of ExaGrid's customers have experienced the benefits of a disk-based grid storage system for retaining their backup data. Without a storage grid to run advanced data reduction applications, he says, customers would have to purchase more than 10 times the amount of lower cost SATA disk than their total FC RAID primary storage disk capacity just to hold successive weeks of full backups. He outlines an actual example: a customer with 1TB of primary data that wants to retain 13 weeks of backup data on disk would require more than 13TB of lower-cost SATA disk storage. Even though SATA disk storage costs only 25% of what traditional FC RAID primary disk storage costs, the significant amount of SATA capacity that is required makes the total cost of this backup disk system even more expensive than the primary storage it's protecting.
Far from being an innovation that will take hold in a year or a decade, Therrien points out that many large companies are already deploying massive grids for the purposes of storage. Search engines like Google, for example, harness storage grids to process search queries in a fraction of the time it would take to perform these on a single large monolithic processing engine.
Storage grids are also realizing their potential in disk-based backup applications by allowing data to be reduced across multiple grid nodes in parallel. And they are being used to more efficiently perform parallel operations on data in heavy R&D, scientific and business environments.
So where will storage grids ultimately end up? Therrien lays it out logically in a way that highlights the power of this approach: the nodes of a grid are servers with some amount of internal disk storage. The cost of one of these servers with 1TB of SATA disk storage is about the same or less than the cost of a RAID disk subsystem with FC or iSCSI connectivity.
It makes sense, therefore, to deploy each terabyte of additional storage capacity as a server to gain the benefits of scalable processing power as well as ubiquitous gigabit Ethernet technology as a grid backplane, says Therrien. The parallel processing power of each grid node will bring new data management features to mainstream IT environments.This article was first published on EnterpriseITPlanet.com.