View As Slideshow: 5 Big Data Apps with Effective Use Cases
4. Big Data application: Zaloni Bedrock
How this Big Data app works: Many businesses know they want to implement a Hadoop data lake, but don’t know how to do so in a cost-effective, scalable way. Moreover, simply putting data into Hadoop does not make it ready for analytics. To use common analytics toolsets, you must know where data is, how it’s structured (or not) and where it came from.
You may also need to prepare it by filtering or joining datasets together, or masking out parts that are sensitive in nature. This typically takes a significant amount of time and effort and can be highly error prone. If you’ve done a poor job ingesting, organizing, and preparing data for analytics, the results of your analytics will be equally poor. Flawed analytics can lead to flawed business decisions and making better business decisions was the whole point of the data lake in the first place.
With Zaloni Bedrock, the process is automated. According to Zaloni, you set it up once and you’re done. It doesn’t matter how much data you are adding to the lake, since there is no technical limit.
Zaloni argues that without a product like Bedrock to help you along, 60 percent or more of the time and effort you spend to build an analytics system using a Hadoop data lake will be spent on data management and data preparation alone.
Use case of note: UnitedHealth Group’s Optum division, an IT and tech-enabled health services business, uses Bedrock as part of their data platform to manage services like data ingest and workflow execution. Bedrock enables Optum to monitor multiple data sources, capture and store schema/operational metadata, and provides features like data catalog search for end users.
5. Big Data application: Tamr
How this Big Data app: Tamr is a data-connection and machine-learning platform designed to make enterprise data as easy to find, explore, and use as Google. According to Tamr, due to the cost and complexity of connecting and preparing the vast, untapped reserves of data sources available for analysis, most organizations use less than 10 percent of the relevant data available to them.
It’s just too manual, too inefficient and too expensive to connect and ready the massive variety of internal and external data for analytics and other applications critical for business growth. Tamr argues that if the industry is going to be successful at helping customers manage the growth and variety of data that lies ahead – from internal sources, external public and private sources, Internet of Things feeds, etc. – a complete overhaul of traditional methods of information integration and quality management will be required.
Use case of note: Multinational media and information company Thomson Reuters faced challenges maintaining critical, accurate data. It had outgrown its manual curation processes and looked to Tamr to provide a better solution for continuously connecting and enriching its core enterprise information assets (data on millions of organizations with more than 5.4 million records pulled from internal and external data sources).
Using Tamr, one project that Thomson Reuters estimated would take six months was completed in only two weeks, requiring just forty hours of manual review time – a 12x improvement over the previous process. The number of records requiring manual review shrunk from 30 percent to 5 percent, and the number of identified matches across data sources increased by 80 percent – all while achieving Thomson Reuters’ 95-percent precision benchmark.
Tamr says that the disambiguation rate (or the rate of resolving conflicts) rose from 70 percent to 95 percent. Furthermore, the knowledge Tamr gleaned from its machine learning activities means that future data integration will take even less time per source.
Photo courtesy of Shutterstock.