Saturday, June 22, 2024

Trends in Information Governance: eDiscovery and Big Data

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

The Information Governance (IG) concept has been around for years. Implementation has proven difficult to downright impossible because the domain is so broad. Enterprise-wide IG is massively scaled, crossing workgroups, location, data types, and business process boundaries. When the enterprise determines to implement IG, it faces the prospect of developing new processes and technology across the entire organization. In addition, the enterprise faces dynamic changes over time: as information management needs morph and shift over the years, IG morphs right alongside them.

Given this broad scale, organizations have the best chance for success by concentrating on the highest risk and biggest pain points that can benefit from IG. IT’s domain has a big claim, particularly with unstructured data. Unstructured data rates are hitting 75-80% of corporate data, and data growth is increasing 35-50% year-over-year. In the midst of this massive data growth, IT is directly responsible for data lifecycle management, user access, data security, and compliance. They are also frequently involved with eDiscovery collections and big data analysis.

These represent big challenges – and IG tools concentrating on unstructured data directly benefit all of these processes. Introducing IG into the data management domain is best done with a combination of technology toolsets and organization priorities that drive policy settings. This combination can deliver IG to manage data across the data center and the organization. And often the same tools can extend to additional workgroups including Legal, Records Management and Business Analysis.

What Information Governance can do for IT

One of IT’s biggest responsibilities is to manage unstructured data. Data of this type exists in many different formats and locations, which makes it a challenge (to say the least) to manage. Governing this data takes IG technology that can intelligently manage many different types of data for compliance, lifecycle, and value.

File analysis platforms are IT’s primary means to deliver IG services to unstructured data. Modern file analysis offers more than basic metadata mapping and alerts: they classify on a rich variety of characteristics and do so at massive scale. They also offer sophisticated query and policy support.

File management products range from specialized storage systems to software that universalizes classification and policies across multiple data sources. The latter has the advantage in governing data across the enterprise. This enables IT to manage files across different storage systems, application servers, and the cloud. This is a big benefit: the cloud environment is subject to the same data management issues that affect files stored behind the firewall. It gets even more complicated because “the cloud” is not a monolithic location. Organizations and employees may be storing files in Office365, public cloud providers, Box or Dropbox, and more. Distributed IG toolsets can discover and analyze files in the cloud as well as on-premise.

Acaveo Smart Information Server (SIS) is an example of distributed file analysis. Acaveo centralizes operational intelligence for files located across multiple on-premise, distributed and cloud data sources. The platform discovers files, classifies them by many characteristics, and applies policies. Irish company Ostia developed Portus to map system and application dependencies between different IT systems.

Storage makers also provide IG tools for unstructured data. Tarmin’s GridBank software uses a global namespace to pool data across distributed storage hardware. GridBank aggregates storage and IG services including data management, search, eDiscovery and analytics. Newcomer Qumulo offers massively scaled storage systems with intelligent data analysis tools. Built-in intelligence discovers, retrieves, and manages massively scaled data located in its system.

Storage system vendors also engineer IG services into their arrays. IBM and HP offer IG services around big data. EMC’s SourceOne division houses its formal IG offerings. SourceOne eDiscovery tools include collection and early analysis, and SourceOne File Intelligence offers file-based information governance.

IT is also responsible for security and compliance, unstructured big data, and often eDiscovery collections and early analysis.

IT employs user access control to secure data. Sadly, many IT departments lack strong compliant policies for user and role access settings. IG tools can strengthen access control by discovering file and folder access settings, and remediating security holes. IT is also responsible for securing sensitive data containing personally identifiable, credit or health information. IG tools can discover and identify this sensitive data, allowing IT to move or delete information as needed.  Vendors like Acaveo that offer strong integration with Active Directory can close big access security holes.

Big data analysis frequently uses unstructured data as its information source. The issue is that this same data is spread around multiple applications and locations, and may or may not be easily accessible for meaningful analysis. IG tools can identify big data sources by set characteristics and apply trend analysis across a wide universe of data.

Unsurprisingly, IBM is a major player in governing big data. IBM InfoSphere operates across unstructured and unstructured data, with deep integration between data sources and standardized processes for big data management and reporting. IBM also owns eDiscovery platform StoredIQ, which includes data analysis and governance in its software suite. HP offers a full complement of information governance tools for big data under its Intelligent Retention and Content Management platform. The suite integrates HP StoreAll, ControlPoint and Records Manager to govern data residing on HP Haven big data storage.

eDiscovery is a perennial IG development driver, with software tools existing to aid in collection and analysis throughout the EDRM workflow. In addition, eDiscovery software makers often engineer their products to work with related processes including records management and compliance. IG-related eDiscovery tools are not relegated to eDiscovery software makers: file analysis products often provide eDiscovery tools as well.

Of the pure eDiscovery vendors, Exterro is a full-service eDiscovery provider whose platform includes IG tools like E-Discovery Data Mapping. AccessData and Nuix Luminate also provide robust IG tools. 


IT is more used to thinking in terms of managing data then governing it. Data management seems practical and achievable, while governance seems to be an exercise in controlling the uncontrollable. It doesn’t help that IG relates to multiple workgroups and processes throughout the enterprise. However, IG technology can directly benefit data management – and governance, and compliance, and security. These are all domains critical to IT. In this age of massive unstructured data growth, managing data for compliance and value is a top priority for IT. The IG tools to manage it are available today.

Photo courtesy of Shutterstock.

Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Latest Articles