Content-addressable storage (CAS) is all about retrieving stored information according to content rather than location. Initially, CAS gained ground as a means to rapidly access documents for compliance purposes. But the indications are that it has moved beyond its compliance roots into such fields as financial record-keeping and preservation of precious documents.
“Interest in CAS has become more widespread due to a couple of factors,” says Moosa Matariyeh, an enterprise storage specialist at CDW Corp, Vernon Hills, IL. “The first is the explosive growth of information in the datacenter. The second is regulatory obligation to store particular kinds of data.”
He reports CAS deployments as diverse as a healthcare facility storing medical images for HIPAA compliance, an engineering firm that needs to archive but be able to refer to past projects, or a trading firm that must keep records of all stock trades to comply with SEC regulations.
“Each of these organizations must keep certain data for extended periods of time and must be able to access that data for at least part of its life, quickly,” says Matariyeh. “CAS is not meant to be used for primary storage – it is a means to manage secondary storage effectively.”
His view is supported by a recent survey by Larstan Business Reports of Potomac, MD. The company’s survey of IT executives revealed cost reduction, storage manageability, record management, content management and e-mail capture as among the primary motivators for an organization to seek a CAS system. Only 21 percent of respondents weren’t interested in CAS technologies.
CAS Community
The CAS community – a consortium of storage vendors and associations including EMC, IBM, SUN, Permabit, Enterprise Strategy Group (ESG) and Storage Networking Industry Association (SNIA) – acts as a central repository for CAS information, best practices and applications.
Rob Peglar, co-chair of the SNIA Data Management Forum’s long-term archive and compliance storage initiative explains that CAS adds value in compliance by the nature of how it stores objects. CAS devices can deliver object immutability. For most users, an object is just a file and immutability means it cannot be overwritten. Once stored in CAS, therefore, the object is under control. This satisfies both the intent and the letter of many regulations. He also believes that CAS changes the paradigm of storage manageability by changing the nature of what is managed.
CAS devices return an object identifier (ID) when an object is stored. Thus, the management of these IDs becomes primary as opposed to the management of typical file metadata (names, directories, permissions, etc.) in typical file systems. CAS is a flat address space while most operating systems implement hierarchical file system structures, e.g. names and directories (folders). In some applications, this reduces the overall time spent managing files by both users and administrators.
Record management is also different in CAS devices compared to normal arrays. Once stored, that record does not change and cannot be overwritten. Thus, the process of tracking record modifications over time (and who did it, when they did it, etc.) is rendered moot – there are no modifications over time, per se. Normal arrays, by themselves, cannot ensure object immutability.
“SNIA has begun an effort around creating specifications related to CAS devices and the related CAS environment,” says Peglar. “The breakthrough for CAS will be seen when these specifications mature and more entities develop non-proprietary CAS devices and software packages to assist users with the most related areas where CAS can help solve problems, namely record and content management.”
Much of this new work in CAS revolves around intelligent software/APIs that can be embedded within applications, and/or within operating systems, to enable applications and systems to take advantage of CAS methods. Like many other methods and techniques that have come to the forefront over time (e.g. storage virtualization) there will probably be future implementations of CAS at the array (subsystem) level, the network level, and at the host (server) level. This corresponds to the layering spelled out by the SNIA Shared Storage Model.
Matariyeh agrees that standardization is vitally needed. Until the various vendors agree to a common means of achieving CAS goals in a non-proprietary manner, CAS may struggle to enter the mainstream.
“The technology of CAS has matured and become an extremely reliable means to store and classify data while verifying its integrity,” he says. “But one of the issues that may be brought up about CAS is that each manufacturer uses their own means of storing data. There is no technical standard as of yet.
This article was first published on EnterpriseITPlanet.com.