Six Myths About Hadoop and Big Data: Page 2

Hadoop is a great Big Data tool, but it might not be what you expect.
(Page 2 of 2)

4. We'll need to hire a bunch of programmers to use Hadoop.

Depending on what you plan to do, this myth may come true. If you plan to build the next great Hadoop-based Big Data suite, you'll need programmers who can write in Java and understand specialized MapReduce programming.

However, if you're content to build on the work of others, programming shouldn't scare you off. Data Integration vendor, Syncsort, recommends leaning on Hadoop-compatible data integration tools that will allow analysts to run advanced queries without having to do any coding.

Most data integration tools will have GUIs that abstract MapReduce programming complexity, and many come with pre-built templates.

Moreover, startups including Alpine Data Labs, Continuuity and Hortonworks offer tools to simplify Big Data in general, and Hadoop in particular.

5. Hadoop isn't suitable for SMBs.

Many SMBs fear that they'll be locked out of the Big Data trend. The big vendors, the IBMs and Oracles, predictably peddle big, expensive solutions. That doesn't mean there aren't SMB-friendly tools out there.

Cloud computing is rapidly democratizing access to sophisticated technologies. "The cloud is turning Capex into Opex," Big Data author Phil Simon notes. "You can take advantage of the same cloud services that Netflix does, and the same thing is starting to happen with Big Data. A company of five can use Kaggle."

Kaggle calls itself a "marketplace that bridges the gap between data problems and data solutions." For instance, startup Jetpac offered $5,000 to someone who could come up with an algorithm that would identify compelling vacation photographs. Most vacation photos are pretty awful, after all, and separating the wheat from the chaff is a tedious, time-consuming process.

Jetpac had people manually rate 30,000 photos, and sought an algorithm that would rank photos the same way actual humans did, just by analyzing metadata (photo size, captions, descriptions, etc.). If Jetpac tried to develop this itself, the company would have spent a heck of a lot more than $5,000, and they would have had a single solution, not their pick of several.

In fact, Jetpac's image processing tool helped them land $2.4 million in VC funding from Khosla Ventures and Yahoo co-founder Jerry Yang.

6. Hadoop is cheap.

This is a common misconception associated with anything open source. Just because you're able to reduce or eliminate the initial costs of purchasing software doesn't mean you'll necessarily save money. One of the problems with the cloud, for instance, is that it's so easy to run a science project on Amazon that developers of all sorts throw projects up in AWS, forget about them, but keep paying for them.

And virtual server sprawl already makes physical server sprawl look quaint.

While Hadoop helps you store and analyze data, how will you get legacy data into the system? How will you visualize the data? How will you share it? How will you secure data as it is shared more often across the enterprise?

A Hadoop solution is actually a patchwork of solutions. You can turn to a company like Cloudera for a complete enterprise solution, or you can start putting together a highly customized solution yourself. Whatever route you choose, you'll need to budget carefully because free software is never really free.

Jeff Vance is a Santa Monica-based writer. He's the founder of Startup50, a site devoted to emerging tech startups. Connect with him on Twitter @JWVance.

Page 2 of 2

Previous Page
1 2

Tags: Hadoop, open source, analytics, big data

0 Comments (click to add your comment)
Comment and Contribute


(Maximum characters: 1200). You have characters left.