Everyone knows that data volumes are growing exponentially. What’s not so clear is how to unlock the value all of that data holds. Enterprises are struggling to figure out how to store, manage and derive any real business value from Big Data.
Part of the problem is that traditional databases just aren't suited for mining Big Data insights. Legacy systems were designed decades ago, long before Big Data was a trend.
Enter Apache Hadoop, an open-source framework that enables the processing of large data sets in a distributed environment. With Hadoop, applications can be run on systems composed of thousands of nodes with thousands of terabytes data.
Gartner estimates the current Hadoop ecosystem market to be worth around $77 million. They expect that it will grow to $813 million by 2016. However, despite a few big-name backers, Hadoop is still relatively unproven in enterprise settings. Critics argue that while Hadoop works great as a processing platform, it's not all that good with queries. The add-ons Hive and Pig both help with this, but Hadoop still isn't quite a fully mature platform.
These startups intend to change that.
What they do: Provide data science solutions for Hadoop and Big Data
Headquarters: San Mateo, CA
CEO: Joe Otto, who previously ran Worldwide Sales for Greenplum, which is now part of EMC.
Funding: Alpine Data Labs is backed by a $7.5 million Series A round of funding from Sierra Ventures and Mission Ventures, along with EMC and Sumitomo Bank. The company is in the process of closing out a Series B round, which is expected to raise between $10 and $13 million.
Why they're on this list: While there are a ton of Big Data tools entering the market, many companies still struggle to gain actionable insight from their mountains of data.
According to Alpine Data, part of the problem is that it's much too difficult to get real insights out of Hadoop and other parallel platforms. Most companies don't know what to do with massive datasets, and few have gotten any further with Hadoop than batch processing and basic querying.
Alpine Data set out to simplify machine-learning methods and make them available on petabyte-scale datasets. Their tools make these methods available in a lightweight web application with a code-free, drag-and-drop interface.
Alpine Data leverages the parallel processing power of Hadoop and MPP databases and implements data mining algorithms in MapReduce and SQL. Users interact with their data directly where it already sits and design analytics workflows without worrying about data movement or complex code. All this is done in a web browser, and Alpine Data then translates these visual workflows into a sequence of in-database or MapReduce tasks.
Alpine Data's visual environment helps teams collaborate and quickly create and deploy analytics workflows and predictive models.
Customers include AT Kearney, Havas Digital, Zion Bank, Kaiser Permanente and CareCore
Competitors: SAS dominates this market, but other startups are moving into this space too, including Platfora, Skytree, Revolution Analytics and Rapid-I.
What they do: Provide a Hadoop-based Big Data Platform
Headquarters: Palo Alto, CA
CEO: Mike Olson, who was formerly CEO of Sleepycat Software, an embedded database company that was acquired by Oracle in 2006. After the acquisition, Olson spent two years at Oracle as VP for Embedded Technologies.
Funding: Cloudera has raised $140 million in venture capital to date. Its investors include Accel Partners Greylock Partners, Ignition Partners, In-Q-Tel and Meritech Capital Partners.
Why they're on this list: Big Data is hot, and Cloudera is the pioneer that first developed a Hadoop-based platform for Big Data. Moreover, they're sitting on a mountain of VC cash and have a solid management team.
Cloudera lets users query all of their structured and unstructured data and have a view beyond what's available from relational databases. Cloudera recently released Impala, a new open-source interactive query engine for Hadoop that enables interactive querying on massive data sets in real time.
Customers include CBS Interactive, eBay, Expedia, Monsanto and Samsung.
Competitors: EMC Pivotal, Hortonworks, MapR. Intel recently joined the market as well, but it's too early to tell how serious they are about this space.