Big Data Calls for Big Storage: Page 2

By its very definition, Big Data's vast trove of data requires a storage capacity that grows ever larger all the time.


How to Help Your Business Become an AI Early Adopter

(Page 2 of 2)

3. Management. There are a million and one storage management tools out there. The most basic one – and one still in wide use even in business, believe it or not, is a simple Excel spreadsheet – but vendors from EMC to Hitachi Data Systems to NetApp offer solid storage management solutions. The trouble is, though, that data-sharing standards are still lacking and escaping vendor-lock is a never-ending challenge.

4. The WAN. As cloud computing becomes mainstream, the simplest way to break down data silos is to leverage the cloud to help with everything from search to backups to raw processing. However, as more storage moves into the cloud, the more the WAN will impede on Big Data progress. The WAN, unfortunately, isn’t keeping up with Moore’s Law, nor with the storage-specific analog Kryder’s Law. Any Big Data storage solution must include some combination of redundant MPLS links, WAN optimization and CDN services.

5. Security. As you break down data barriers, certain people may get access to data (say HR records) that they should never, ever see. Thus, authentication, access and security in general are a major Achilles heel of Big Data storage.

Hadoop helps tame data

Since traditional databases, such as SQL, weren’t designed with Big Data in mind, eventually a Big Data alternative emerged: Apache Hadoop.

According to the Apache Software Foundation, Hadoop is a “framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

“Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.”

Key modules to consider, as far as Big Data storage is concerned, include HDFS, a distributed file system to access application data; the Hive data warehouse infrastructure; and Chukwa, a data collection system for managing large distributed systems.

The Future of Big Data storage: the cloud

Big Data storage is quickly becoming a subset of cloud storage. As data centers are virtualized, and as more data is moved into third-party data centers, Big Data and cloud storage challenges (and opportunities) will begin to merge.

Granted, not all applications will move off-site, nor will every single application be one open to Big Data sharing. However, as security and access rights solutions strengthen, don’t be surprised if nearly every application under the sun is able to share data with nearly every other one – in an ideal world. Of course, standards fights will brew, vendors will do their best to lock customers into their solutions and problems like data loss and IP theft will undermine this ideal world, but it will be feasible from a technical standpoint.

Most enterprises will get their cloud storage feet wet with data backups. Eventually, they will use APIs to connect their on-premise data repositories with cloud and SaaS services, such as Salesforce.com, and cloud storage will evolve into Big Data storage. Further out, as most infrastructure moves into the cloud, various cloud providers will offer an array of Big Data storage options as a service.

Along the way, Flash and SSD (Solid-State Drives) may make disk drives obsolete; in-memory storage could break out of its Java purgatory, and biologists working with the human genome may well provide storage insights derived from DNA and gene sequencing.

Top Big Data storage vendors

Note: This is by no means an exhaustive list, and placing vendors in one category or another is largely subjective.


EMC (Key Big Data acquisitions: Greenplum and Isilon)

IBM (Key acquisitions: Cognos, Netezza, OpePages, Algorithmics, Texas Memory Systems)

Hitachi Data Systems


Oracle (Key acquisition: Endeca)

HP (Key acquisitions: Autonomy and Vertica)

Cisco (Key acquisition: Truviso)

Dell (Key acquisition: Compellent)


DataDirect Networks


GridIron Systems


Nimble Storage





Violin Memory Systems

Virsto Storage


Photo courtesy of Shutterstock.

Page 2 of 2

Previous Page
1 2

Tags: business intelligence software, enterprise storage, business analytics, big data

0 Comments (click to add your comment)
Comment and Contribute


(Maximum characters: 1200). You have characters left.



IT Management Daily
Don't miss an article. Subscribe to our newsletter below.

By submitting your information, you agree that datamation.com may send you Datamation offers via email, phone and text message, as well as email offers about other products and services that Datamation believes may be of interest to you. Datamation will process your information in accordance with the Quinstreet Privacy Policy.