Data lifecycle management addresses how to gain control of and capitalize upon the vast amounts of data most organizations possess. Enterprises that can break down their organizational silos and intelligently unify and analyze their data are more competitive and more successful than their peers. Accomplishing those goals requires careful organization of the five different phases that comprise the data lifecycle: creation, storage, usage, archiving, and destruction. This article details those stages and gives best practices for each.
What is the Data Lifecycle?
Broadly speaking, data lifecycle management is the discipline of ensuring that data is accessible and usable by those who need it from beginning to end. The data lifecycle itself covers all the stages an organization must pass through in its interaction with data, whether financial, customer-focused, or otherwise. Depending on who you ask, there are either five phases to the data lifecycle or eight. The five-stage cycle is the simpler and more common one:
Creation > Storage > Usage > Archiving > Destruction
The eight-stage cycle is an expansion of two stages of the five-stage cycle. In this model, “collection” and “processing” are part of the Storage phase, while “management,” “analysis,” “visualization,” and “interpretation” are part of the Usage and Archiving phases.
Generation > Collection > Processing > Storage >
Management > Analysis > Visualization > Interpretation
Successfully navigating each stage requires consideration for internal processes and users, infrastructure and technology, external regulators and legal authorities, consumer privacy, and more, making data lifecycle management a complex topic touching on many areas of an enterprise’s work. Let’s look at each stage in more detail.
Stage One: Data Creation
Because enterprises take in a lot of data, it’s easy to take this stage for granted. But consider this: an organization’s data is created on a wide range of devices across many geographies. To do it right, this stage involves making sure users have the right tools to create data and the right processes in place to ensure that the data can be stored in the appropriate formats and types.
Essentially, the Creation stage takes the initial data, ensures it can be captured, and is made available to the appropriate storage medium. To move to the next stage—the Storage phase—the data must be processed properly. Metadata should be added to make it searchable, for example, and access and privacy requirements are identified and accounted for. This phase is best done automatically at the metadata layer as the data is fed into the storage media.
“Properly implemented, metadata acts as a roadmap to give organizations the insights needed to control all of their data and storage resources,” said Tony Cahill, senior solutions architect at StrongBox Data Solutions. “In hybrid and cloud environments, metadata can be used to improve data resilience, and reduce egress charges by targeting specific files.”
Stage Two: Data Storage
The Storage stage is complex and carries many ramifications for the remainder of the lifecycle. If data is dumped carelessly onto the cloud or disk arrays, for example, it can easily get lost, be hard to manage, or become expensive to retain. There are many options for storage media—cloud, flash, disk, tape, or optical media, for example—but thought needs to be put into finding the right place to keep it, taking into account such factors as cost, accessibility, and the level of performance needed by the applications it serves.
Security is also a concern in modern storage, which means that data immutability, security, privacy, and storage location must also be considered during this stage, as well as redundancy—to guard against disasters or data breaches, multiple backup copies of the data should be made. Additionally, external rules and regulations may dictate how data is stored. European nations, for example, don’t want data exported outside their borders, and impose harsh penalties for violators. Enterprises working in heavily regulated industries must ensure their data complies with all relevant regulations, including HIPAA, Payment Card Industry (PCI), Sarbanes-Oxley, and any applicable Security Exchange Commission (SEC) rules.
Organizations should also be focused on internal requirements during this stage. Data stores should be organized so as to support business objectives and business continuity in the event of natural disasters, failures, or malware.
Learn more: Data Sovereignty and Why Does it Matter?
Stage Three: Data Usage
How the data is stored in the Storage stage dramatically affects the Usage stage. Stored data needs to be made available to the users and applications that need it and restricted from those that don’t. Roles must be defined carefully and access rights assigned, but security, privacy, and performance must be balanced so that the burden on users is not so great that they can’t use the data or seek alternate “shadow systems” to avoid it.
The Usage stage also includes making data available for automated reports, dashboards, and analysis, which also means real-time data visualization needs. Analytics may be the most fundamental aspect of modern data usage, with a wide range of applications and artificial intelligence (AI) tools. These apps need access to ever-larger data stores. Enterprise data must be managed so that both leadership and staff have access to the data they need, which requires detailed management of data at every step of the process.
Stage Four: Archiving
In the Archiving stage, thought must be given to the long-term storage of data. Because of the sheer volume of data in enterprise uses, it is no longer feasible to just retain everything in primary storage, whether that is flash or disk. With flash, prices climb alongside capacities, straining budgets. Even disk storage is expensive in large quantities, forcing businesses to seek a range of media to meet their budgets and needs.
An analysis by Horison Information Strategies highlights the fact that up to 80 percent of data is rarely or never accessed within the first month or two—which means that mission-critical systems are almost never going to request any of that data. The best approach is to retain the 20 percent active-use data on flash or disk and store the remainder in an immutable tape archive.
To alleviate concerns about how quickly that data could be made available if needed, active archive solutions can provide data from tape to analytics and AI within just a few minutes. Used in combination with tape, these tools can ensure data’s longevity and preserve access while preventing corruption and other retention challenges.
“New erasure coding algorithms optimized specifically for cold storage will enhance data protection and durability for long-term retention while reducing storage costs significantly vs. multi-copy and cloud-based solutions,” said Tim Sherbak, Quantum’s enterprise products and solutions manager.
Stage Five: Data Destruction
No data should be destroyed before going through the Archiving stage, and a well-managed archive will include provisions to destroy data that has reached its end of life. But the rise of AI and analytics has also given rise to a philosophy demanding that data be retained indefinitely—because who knows when it might prove useful?
The practicalities of such an approach present challenges. Storing data until the end of the time would be an expensive proposition. One solution might be to summarize old data or submit it to analysis and classification before it is destroyed, providing a record of its key facets without burdening organizations with unwieldy data storage requirements.
Another point to consider is that the destruction of data can have serious implications. Improperly destroyed data can be a cybersecurity or privacy risk, and data destroyed prematurely can be a compliance violation. So can data retained for too long, while also having cost ramifications. This means that the Destruction stage sounds simple enough, but in practice, requires careful consideration. Enterprises will need to take their own internal needs into account and weigh them against external and legal requirements.
The Benefits of Data Lifecycle Management
The key benefits of incorporating data lifecycle management into an enterprise are numerous, but generally fall into three areas.
By bringing data out of silos and making it accessible to analytics and artificial intelligence systems, organizations glean a great many more insights than would otherwise be possible. This can have an impact on everything from reporting and real-time monitoring to customer engagement and competitive intelligence.
A system must be in place to look after data in accordance with the best interests of users, shareholders, and the organization as a whole. This ensures that data is processed and available wherever and whenever it is needed and plays a crucial role in compliance.
The correct processes and data lifecycle technologies should provide sufficient cybersecurity and privacy safeguards and prevent data being lost due to errors such as lack of backup, corruption, or theft.
Bottom Line: Managing the Data Lifecycle Stages
No matter how much thought and planning goes into data lifecycle management, errors will be made, and adjustments will be needed. Data lifecycle management is an ongoing process, not a one-and-done. To successfully manage data throughout its lifecycle, enterprises should listen to users—those who work with the data day in and day out. If they complain that valuable data is being ignored, certain types of data need to be retained longer, or that privacy and security hurdles have hindered performance, adjustments may be needed. External sources—regulatory bodies and legal authorities, for example—need also be taken into consideration. Constant monitoring and improvement of all data lifecycle processes can eventually produce a successful model for all concerned.
Read next: Data Management: Types and Challenges