Datamation content and product recommendations are
editorially independent. We may make money when you click on links
to our partners.
Learn More
The open source Hadoop project is all about providing the ability to manage and understand large datasets. Yahoo which uses Hadoop to manage 120 terabytes of data per day, this week released a new version of their edition of Hadoop but they weren’t the only ones with a new Hadoop release this week.
Commercial Hadoop vendor Cloudera this week announced Cloudera’s Distribution for Hadoop (CDH) version 3, including some technologies that were previous closed source. In addition to the new version of CDH, Cloudera is announcing a new Enterprise version of their Hadoop distribution, providing additional usability and management features for enterprise users.
CDH is a version of the Apache Hadoop project that bundles additional projects and technologies to make Hadoop more usable for enterprises. CDH includes the Yahoo developed open source Oozie workflow engine as well as including projects originated by Cloudera. Among the Cloudera-originated projects is one called HUE (Hadoop User Experience), which began its life as the closed source Cloudera Desktop.
“Cloudera Desktop was a desktop based user interface for people building apps for Hadoop,” Cloudera CEO Mike Olson told InternetNews.com. “That was always available for free, but it wasn’t open source. We believe that the platform has got to be open source in order to succeed.”
Olson added that Cloudera has rebranded the desktop product as HUE and it has now also evolved. He explained that HUE has become a collection of APIs and an SDK aimed at developers that want to build attractive applications that talk to a Hadoop cluster.
Additionally Olson noted the Cloudera developed the open source Flume project. The Flume project, which is included as part of CDH, is all about getting various data sources into a Hadoop cluster in a continual, reliable and fault-tolerant way. Flume is a complement to the Sqoop project, also developed and open-sourced by Cloudera, which is a tool for importing database tables into Hadoop.
With the HBase project included in CDH, Cloudera is also aiming to expand beyond just SQL types of database inputs.
“HBase is a NoSQL layer on top of HTFS (Hadoop’s filesystem),” Olson said.
Cloudera Enterprise
To date, Cloudera has built its business around offering services for Hadoop, but with Cloudera Enterprise, they’re now aiming to monetize software as well. Cloudera Enterprise includes deployment management tools as well as support and legal indemnification.
As to where Cloudera draws the line between what is an open source feature for CDH versus what is an Enterprise feature for paying customers, it’s all about the platform.
“If it is a platform feature, it belongs in the open source platform,” Olson said. “Platform features include ways to store data reliably — basically any of the plumbing that is required to make data storage and analysis work well.”
Olsen explained that the enterprise features are the tools that are required to integrate Hadoop clusters with existing infrastructure and the dashboards that IT staff needs to manage thousands of nodes in a cluster.
While Yahoo is a big contributor and backer of Hadoop, Olson doesn’t see Yahoo’s version of Hadoop as being competitive with Cloudera’s corporate efforts. Olson noted that Cloudera benefits from the work that is done in the open source Hadoop community, including Yahoo’s contributions. That said, in his view the Yahoo version of Hadoop isn’t necessarily the right fit of services for enterprise deployments.
“Yahoo has build a Hadoop distro that runs well on its own infrastructure,” Olson said. “Not all enterprises have the same compute infrastructure as Yahoo does and Yahoo does not provide support for that software.”
Sean Michael Kerner is a senior editor at InternetNews.com, the news service of Internet.com, the network for technology professionals.
-
Ethics and Artificial Intelligence: Driving Greater Equality
FEATURE | By James Maguire,
December 16, 2020
-
AI vs. Machine Learning vs. Deep Learning
FEATURE | By Cynthia Harvey,
December 11, 2020
-
Huawei’s AI Update: Things Are Moving Faster Than We Think
FEATURE | By Rob Enderle,
December 04, 2020
-
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
-
Key Trends in Chatbots and RPA
FEATURE | By Guest Author,
November 10, 2020
-
Top 10 AIOps Companies
FEATURE | By Samuel Greengard,
November 05, 2020
-
What is Text Analysis?
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
-
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
-
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
-
The Super Moderator, or How IBM Project Debater Could Save Social Media
FEATURE | By Rob Enderle,
October 16, 2020
-
Top 10 Chatbot Platforms
FEATURE | By Cynthia Harvey,
October 07, 2020
-
Finding a Career Path in AI
ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
-
CIOs Discuss the Promise of AI and Data Science
FEATURE | By Guest Author,
September 25, 2020
-
Microsoft Is Building An AI Product That Could Predict The Future
FEATURE | By Rob Enderle,
September 25, 2020
-
Top 10 Machine Learning Companies 2021
FEATURE | By Cynthia Harvey,
September 22, 2020
-
NVIDIA and ARM: Massively Changing The AI Landscape
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
-
Continuous Intelligence: Expert Discussion [Video and Podcast]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
-
Artificial Intelligence: Governance and Ethics [Video]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
-
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI
FEATURE | By Rob Enderle,
September 11, 2020
-
Artificial Intelligence: Perception vs. Reality
FEATURE | By James Maguire,
September 09, 2020
SEE ALL
ARTICLES