SHARE

MapR Technologies Brings Enterprise-Class to Big Data and Operational Data Alike

Simply stated, MapR Technologies delivers an enterprise-class distribution for Apache Hadoop with an in-Hadoop NoSQL (Not-only SQL) database. What that does not state, though, is how critical this is to making big data and operational data both work not only at scale, but with the performance, availability, security and other characteristics that give meaning to […]

Written By

David Hill

Sep 5, 2014

7 minute read

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Simply stated, MapR Technologies delivers an enterprise-class distribution for Apache Hadoop with an in-Hadoop NoSQL (Not-only SQL) database. What that does not state, though, is how critical this is to making big data and operational data both work not only at scale, but with the performance, availability, security and other characteristics that give meaning to the use of the term “enterprise-class.” Let’s consider why that is the case.

The Data Deluge Demands a New Architecture

The data deluge continues. IDC (International Data Corporation) defines three IT platforms over time. The mainframe (first) and client-server (second) are not going away, but the third platform (cloud, mobile, social media and big data) has to be taken fully into account. The first two platforms tend to be application-driven, whereas the third tends to be primarily data-driven.

As I have said before, application-driven means that the data is created and exists to meet the needs of the application (such as an online transaction processing system) so the pair is tightly coupled. Data-driven means that the data is created and exists to meet its own needs; yes, an application may support its creation (such as an e-mail), but the data has meaning and value apart from the application. Applications in a data-driven world are servants, not masters.

The mainframe is still the torch bearer for SQL-type structured databases. The client-server architecture has a home for a lot of structured data (that can be sorted in traditional relational databases), but also deals with a lot of semi-structured data (e-mails, word processing documents and the like that can be searched). The third platform is a real mixed bag of structured data, semi-structured data and true unstructured data (bit-mapped data that can be sensed, such as video, but neither sorted nor searched directly, except for attached metadata).

Enterprises create all these varieties of data in seemingly endless volumes, so it would be incredibly handy to be able to manage all types in a single database. However, that database cannot be a table-limited, SQL-type database (which while very, very useful, was designed for a much more constrained data world). And the answer is Hadoop.

Hadoop in its Native State Is Not Enough

Open source Hadoop is the new database architecture of choice. This “shared nothing” (distributed memory or disks, for example) approach has proved to be very popular and is an exemplar of why the open source movement has proven to be so productive. Alas, while Hadoop has proven its worth, it also has its limitations, such as not being inherently enterprise-grade or enterprise-class.

Now this may not matter for a beneficial big data predictive analysis where, if the working copy of data went away or security were compromised, no revenue was lost and no really sensitive information was compromised, so no harm was done. But in enterprises where data analysis and tight security have real value (and whose employees could lose their jobs if something major goes wrong), enterprise-class is not a buzz word, but a necessary state.

Moreover, enterprises may value other characteristics, such as performance, that not all distributions of Hadoop are able to deliver. Now, there are at least a couple of alternatives to solving the problem. One is to deliver a non-Hadoop database with similar capabilities but that is also enriched. This gives the developer total future product development control, but does not take advantage of the leverage of no- or low-cost open source development. The second alternative is to build upon the Hadoop framework but add other extensions that enrich the distribution by adhering to standard APIs.

That yields a proprietary solution that adds value but does not provide lock-in. MapR Technologies has elected to follow this alternative.

What MapR Technologies Brings to the Table

Big data disrupts traditional IT thinking in many ways. One is in the overwhelming number of data types that have to be managed. Big data can include transactions (credit card transactions live and call detail records historical), streamed data (sensor/machine-based data, such as from the Internet of Things), interactional data (such as clickstream results), and observational data (such as customer sentiments). As has been said often, data has now taken its place as the fourth production factor beside the three in traditional economics — land, labor, and capital.

Traditional data warehouses still have a place in the world, but analyst Doug Laney (now of Gartner) had it right when he talked about the volume, variety and velocity of data; this leads to the need for a new data architecture. Hadoop represents that new world not only in being able to handle all data types, but also with having a schema required only when data read is critical. This is in stark contrast to the expensive and time consuming process of designing and building a build-it-and-hope-they-will-come traditional data warehouse and close coupling of processing with data in a scale-out parallel processing fashion.

A fundamental capability offered by MapR is that it tightly couples the analytical-base that Hadoop delivers with the operational capabilities that traditional relational databases, such as Oracle, IBM, or Microsoft, provide. For many analyses, such as a churn analysis, which is a predictive analysis, historical information delivered in a batch mode is sufficient. But in a world of mobile application servers and Web application servers, the ability to operate in real-time with user data (such as user profiles and states, user interactions and real-time location data) is crucial.

In short, MapR provides a level of integration across data types that native Hadoop does not deliver. That includes support for mixed workloads, integrated search capabilities, and the ability to deploy and manage as a single cluster for Hadoop and NoSQL. In addition, as part of its in-Hadoop database, MapR delivers enterprise-friendly capabilities for a distributed, unified namespace, data management and data protection (including disaster recovery) for all data files and tables.

As part of its enterprise-class capabilities, MapR claims zero downtime, the high availability and disaster recovery capabilities that enterprises require, through no single point of failure, instant recovery upon node failure, and no regular maintenance and downtime. Security is key for claiming enterprise-class status and MapR delivers wire-level authentication and encryption as well as fine-grained access control. And for database reliability, MapR provides for the famous ACID (atomicity, consistency, isolation and durability) capabilities for row-level transactions.

But all these capabilities (and many others as well) would be for naught if MapR didn’t also deliver high performance and scalability. MapR states it has less than 2 ms (millisecond) response time with consistent low read latency, as well as claiming that it has four to ten times better throughput compared to other NoSQL databases (although competitors may beg to differ). MapR is scalable to 10,000 nodes with the ability to handle millions of columns, trillions of rows per table and up to one trillion tables. Big data is big, but it is not infinite and this should be enough for the vast majority of cases (probably all, but one has to allow for extreme cases).

Mesabi Musings

If data is indeed the fourth production factor in economics, then its impact is not only huge now, but will become ever more important as time goes by. And “big data” is the code word to think about in regards to that deluge of data and what it means. In addition, traditional data architectures are not able to meet the needs of volume (tsunami rather than fire hose is probably the best analogy), variety (the number of data types seems to be exploding), and velocity (mercurial). Hadoop was meant to address the situation, but while the direction and basic capabilities pursued by that open source community are on the right track, native Hadoop alone is not enough.

MapR believes that its in-Hadoop database distribution addresses the issues. With its integrative capabilities, MapR marries traditional batch analytics with real-time operational data. With its performance and scalability, it meets the needs of a wide range of use cases. And with its enterprise-class capabilities, enterprises should have no reason for not giving MapR Technologies a good look for managing the big-and-getting-bigger data deluge.

Photo courtesy of Shutterstock.

Ethics and Artificial Intelligence: Driving Greater Equality

FEATURE | By James Maguire,
December 16, 2020
AI vs. Machine Learning vs. Deep Learning

FEATURE | By Cynthia Harvey,
December 11, 2020
Huawei’s AI Update: Things Are Moving Faster Than We Think

FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA

FEATURE | By Guest Author,
November 10, 2020
Top 10 AIOps Companies

FEATURE | By Samuel Greengard,
November 05, 2020
What is Text Analysis?

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media

FEATURE | By Rob Enderle,
October 16, 2020
Top 10 Chatbot Platforms

FEATURE | By Cynthia Harvey,
October 07, 2020
Finding a Career Path in AI

ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science

FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future

FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2021

FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI

FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality

FEATURE | By James Maguire,
September 09, 2020

SEE ALL
ARTICLES

MapR Technologies Brings Enterprise-Class to Big Data and Operational Data Alike

David Hill

Company

Categories

MapR Technologies Brings Enterprise-Class to Big Data and Operational Data Alike

RELATED NEWS AND ANALYSIS

David Hill

Company

Categories