Two notable big data vendors have launched Hadoop search tools this week. Cloudera announced general availability of the Impala SQL query engine, and MapR offered a beta version of LucidWorks search software along with its M7 release.
InformationWeek’s Doug Henschen reported, “Last fall MapR set out to improve on HBase, Hadoop’s built-in NoSQL database. On Wednesday it delivered on that promise and it announced a next move: integrating search capabilities with its M7 Hadoop distribution with partner LucidWorks.”
Computerworld’s Joab Jackson noted, “LucidWorks Search is the commercial version of the open source Apache Lucene/Solr full-text search engine. With the new MapR integration, LucidWorks Search can search through either data on the Hadoop File Systems (HDFS) or on files on other file systems. LucidWorks Search offers snapshots and mirrors for high availability, and eliminates much of the work required to install Lucene/Solr from scratch. It also offers native support for more data sources, a graphical user interface and a security framework.”
GigaOm’s Derrick Harris observed, “There is no shortage of confidence in the Hadoop space, and market leader Cloudera bolstered its own on Tuesday with the general availability of its Impala SQL query engine for Hadoop…. Launched as a private beta in May 2012 and made public in October, Impala is Cloudera’s attempt to address the growing demand for interactive SQL analytics on Hadoop data. It’s essentially a massively parallel database designed to share the same storage platform and metadata as Hadoop MapReduce, only it is its own separate processing engine.”
ZDNet’s Andrew Brust added, “This fits nicely with Cloudera’s announcement yesterday that it has formed an alliance with BI powerhouse SAS. That alliance is not just a business arrangement either, as SAS engineers have adopted their technology to deploy physically over Hadoop clusters and perform their analyses in a parallel fashion. This is a huge deal as it avoids data movement between SAS and Hadoop, analyses can be performed over full data sets and not just samplings of the source data.”