Data classification deals with the various processes used to understand information assets.
It incorporates techniques such as assigning a value to those assets, tagging them in different ways, and determining the effort and cost required to secure them, especially the most critical of those assets. Tags might be done for data type, sensitivity, value, risk, etc.
Such practices help organizations appreciate the value of their data so they can decide what requires the highest level of protection and how much risk is posed to that data. Based on that foundation, they can implement controls that are designed to mitigate risks.
Here are some of the top trends related to data classification:
1. Governments loosen the reins to enable data sharing
Expect to see government agencies transition from a need-to-know to a need-to-share approach for data management, implementing data sharing mandates across many programs, according to Nancy Patel, VP and GM of public sector, Immuta.
This is a response to an expanding threat landscape with an uptick in near-peer threats to the U.S. as well as threats against U.S. alliances and partnerships. Homeland security threats, protection from the next pandemic, and the need to better understand macroeconomic trends, for example, are triggering the need for data sharing across governments and agencies.
“In order to share data safely and securely, government entities will utilize tools like data masking and redaction to enforce data policies in real-time,” Patel said.
“At the same time, ensuring the right people have access to the data they are allowed to access at the right time remains paramount.”
2. Data classification is essential to data protection
Fundamental to every company’s success is maintaining the security of their data amidst the stunning velocity with which it is being created and utilized.
A major trend has been the transition away from protecting all data as a whole toward more robust tiering and group-based classification systems, according to Chadd Kenney, VP of product, Clumio.
This ensures data security and compliance, while also providing cost savings to organizations.
Take, for example, a health care data lake. A majority of the information that is backed up from that data lake requires only 30 days of retention for operational recoveries. But the data lake may also contain health records that need to be retained for six years to comply with HIPAA. In this case, rather than backing up all the objects that comprise the data lake for six years, data classification during backups can reduce costs by over 90% without any compromise to security and compliance.
“Classification by access patterns, object tags, tiers, and other metadata ensures that businesses are able to store their data in a way that’s neither overprotected nor under-protected, but perfectly tailored to the unique aspects of that dataset,” Kenney said.
“The past year has shown a shift to classifying data in a way that best protects it while reducing costs and meeting compliance standards.”
3. Efficient indexing and sampling
The path to success is often said to be paved with data. The real problem now, though, is how to handle it all.
Data classification, storage, and analysis, after all, are often easier said than done. And as the sheer amount of data being processed continues to increase on a geometric scale, the amount of brute force compute required to classify and model all of that information is becoming unwieldy.
“Efficient indexing and sampling are the only ways to train and generate more lightweight models that can tackle pointed applications,” said Daren Trousdell, CEO and chairman, NowVertical Group.
“As data retention goes off the deep end, it’s critical that organizations are in total control of their data estate, have the ability to determine what is critical — and more importantly, what isn’t — and then prioritize their storage and compute efforts accordingly. This kind of proper data management and prioritization is the only way for organizations to properly unlock their potential.”
4. Surgical data management
Data classification is growing in importance particularly as it relates to unstructured data management.
It is essential to make data easy to search and manage, supporting risk management, compliance, and legal requirements as well as storage cost and performance optimization.
For instance, if IT wants to classify all data belonging to ex-employees as “zombie data,” a data classification strategy would find and aggregate those files and confine them for deletion.
“Increasingly storage teams will work with departmental IT managers and other data professionals to manage data more surgically versus with one-size-fits-all policies: ideally, delivering the right data to the right teams at the right time,” said Darren Cunningham, VP of marketing, Komprise.
“IT organizations are amid a shift from managing storage technologies to managing data, as massive data growth is pushing the limits of IT budgets and staff time and as hybrid cloud infrastructures prevail, creating more data silos. By managing data and its movement between storage tiers from on-premises to edge to cloud, IT can reduce costs and move data where it needs to be in the moment for value generation.”
For instance, an R&D team could devise a policy based on a metadata tag denoting file type, such as instrument data, and project name, whereby those tagged files would automatically move to a cloud data lake once the project has finished. Later, other researchers could manipulate that data as needed to support new research or product development.
Modern data classification will depend upon data management solutions that support user-driven tagging and role-based access controls, so departmental users can view and search their own file and object shares and buckets. IT organizations will also need to invest in their people, through training and education of traditional data storage architects, to take on new unstructured data management and data services requirements for data tagging, classification, analysis and automated data policy and workflow management.
David Wagner, senior research director at Avasant Research, said the biggest trend is data classification is automation.
Data management vendors are offering services that automatically tag and classify data, not only by type or sensitivity, but by important identifiers.
For instance, personally identifiable information such as a social security number can be automatically tagged for either deletion or special storage requirements. In regions with “right to be forgotten” rules, information can be tagged for deletion based on age or any other pertinent factor. Similarly, if data is required to be kept a certain number of years, such as tax data and some health care records, the data can be tagged to be held until a determined date. Automation is crucial to this operation.
“Using humans or business processes to shunt data into the proper storage format based is riddled with human error,” Wagner said.
“Not only that, but companies could be paying to store data they no longer wish to store or be fined for holding data they are not permitted to store. Automated tagging and classification also minimizes the cost of breaches, ensuring that the only data lost is what the company truly needed to hold.”