Scaling systems on a distributed basis to handle petabytes of information is no easy task, though it’s one that the open source Apache Hadoop project delivers for such big names as Facebook, Google and Yahoo.
Now, the Hadoop framework for running applications across large clusters could see a boost — thanks to the first official commercial distribution from its lead backer, Cloudera.
With today’s release of the Cloudera for Hadoop distribution, Clouderais aiming to push Hadoop into wider usage by making it easier and more flexible to deploy.
It’s an important development for one of the key players in a closely watched project responsible for powering projects and products at some of the biggest Internet firms.
For instance, Cloudera founder Christophe Bisciglia, who formerly served as the manager of Google’s Hadoop cluster before setting up Cloudera in 2008, said that the search engine leader uses Hadoop to power its academic datacenter. The program, in partnership with the National Science Foundation, makes thousands of CPUs and lots of storage available for data research across different disciplines.
The launch of Cloudera’s first release also comes at the same time that the company, which makes its money by providing commercial support, training and consulting for Hadoop, said it has closed on $5 million in venture funding to grow its commercial offerings.
However, Bisciglia said Cloudera would continue to work with the big names also involved in Hadoop’s development — a project in which Bisciglia said his company also actively participates.
“We work closely with developers at Yahoo!, Google and Facebook, and we expect that to continue,” Bisciglia told InternetNews.com. “They have solutions for some of the deployment problems we are addressing for regular users, but it’s obvious that we all see the value in converging the code that runs on production systems. I’d rather not speculate on specifics, but I am excited to continue working with [those] organizations.”
The idea of a clustered file system is not unique to Hadoop. Oracle has its own Oracle Clustered File System (OCFS) and Red Hat has its Global File System (GFS). Yet Bisciglia argued that what Hadoop does is somewhat different, noting that OCFS and GFS are designed to implement the same requirements as regular filesystems, but in a distributed manner.
“Hadoop and HDFS throw out the past requirements, and are optimized for working with very large data sets — many terabytes to petabytes,” Bisciglia said. “What this means is things like accessing a small chunk — a few KB like an individual Web page or document — of data from a random file is rather slow in comparison, but Hadoop excels at using many processors and disks to store process exceedingly large volumes of data.”
According to Bisciglia, Cloudera’s distribution for Hadoop sweetens the deal for using Hadoop — lowering the barrier to entry for enterprise users by including a number of tweaks and common, important tools.
As a result, while Cloudera’s version of Hadoop is based on the most recent stable version of the core open source project, its distribution may not be exactly the same as the open source project.
“We sometimes include code that we developed to resolve customer issues or feature requests, and we may include that code while we are in the process of contributing it back to Apache,” Bisciglia said. “The core will always be the same, but our packaging and user experience will be recognizable.”
For instance, Cloudera includes the Hadoop Distributed File System (HDFS), one of the key file systems supported under Hadoop, and one that Cloudera claims can support tens of millions of files in a single instance.
The distribution also includes MapReduce technology, an open source project commonly used with Hadoop that enables applications to divide up into multiple parallel blocks. Meanwhile, data summary analysis is provided by way of the Hive data warehousing infrastructure, another open source tool included in the distribution.
Currently, Cloudera’s Hadoop distribution is being made available for Red Hat Enterprise Linux and its variants, though Bisciglia said wider support is on the roadmap.
Moving forward, Bisciglia said he expects that companies in the Web 2.0 space will adopt Hadoop as well as those in biotech, financial services and retail. He added that the key challenge to wider adoption is all about ease of use and deployment, which is what Cloudera is trying to fix.
“Hadoop needs to be just as easy to deploy and use as any other piece of enterprise software,” Bisciglia said.
“We’re taking steps in that direction by using standard tools for packaging and deployment, and you can expect to see similar improvements and standardizations for developers and users enterprise Hadoop clusters,” he added. “The primary way we overcome these challenges is by giving our distribution away fro free, and actively engaging the community in solving problems.”
This article was first published on InternetNews.com.
Ethics and Artificial Intelligence: Driving Greater Equality
FEATURE | By James Maguire,
December 16, 2020
AI vs. Machine Learning vs. Deep Learning
FEATURE | By Cynthia Harvey,
December 11, 2020
Huawei’s AI Update: Things Are Moving Faster Than We Think
FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA
FEATURE | By Guest Author,
November 10, 2020
FEATURE | By Samuel Greengard,
November 05, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media
FEATURE | By Rob Enderle,
October 16, 2020
FEATURE | By Cynthia Harvey,
October 07, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science
FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future
FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2021
FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI
FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality
FEATURE | By James Maguire,
September 09, 2020
Datamation is the leading industry resource for B2B data professionals and technology buyers. Datamation's focus is on providing insight into the latest trends and innovation in AI, data security, big data, and more, along with in-depth product recommendations and comparisons. More than 1.7M users gain insight and guidance from Datamation every year.
Advertise with TechnologyAdvice on Datamation and our other data and technology-focused platforms.
Advertise with Us
Property of TechnologyAdvice.
© 2025 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this
site are from companies from which TechnologyAdvice receives
compensation. This compensation may impact how and where products
appear on this site including, for example, the order in which
they appear. TechnologyAdvice does not include all companies
or all types of products available in the marketplace.