Download the authoritative guide: Cloud Computing 2018: Using the Cloud to Transform Your Business
Big data security is a constant concern because Big Data deployments are valuable targets to would-be intruders. A single ransomware attack might leave your big data deployment subject to ransom demands. Even worse, an unauthorized user may gain access to your big data to siphon off and sell valuable information. The losses can be severe. Your IP may be spread everywhere to unauthorized buyers, you may suffer fines and judgments from regulators, and you can have big reputational losses.
Securing big data platforms takes a mix of traditional security tools, newly developed toolsets, and intelligent processes for monitoring security throughout the life of the platform.
Big Data Security Overview
Big data security’s mission is clear enough: keep out on unauthorized users and intrusions with firewalls, strong user authentication, end-user training, and intrusion protection systems (IPS) and intrusion detection systems (IDS). In case someone does gain access, encrypt your data in-transit and at-rest.
This sounds like any network security strategy. However, big data environments add another level of security because security tools must operate during three data stages that are not all present in the network. These are 1) data ingress (what’s coming in), 2) stored data (what’s stored), and 3) data output (what’s going out to applications and reports).
Stage 1: Data Sources. Big data sources come from a variety of sources and data types. User-generated data alone can include CRM or ERM data, transactional and database data, and vast amounts of unstructured data such as email messages or social media posts. In addition to this, you have the whole world of machine generated data including logs and sensors. You need to secure this data in-transit from sources to the platform.
Stage 2: Stored Data. Protecting stored data takes mature security toolsets including encryption at rest, strong user authentication, and intrusion protection and planning. You will also need to run your security toolsets across a distributed cluster platform with many servers and nodes. In addition, your security tools must protect log files and analytics tools as they operate inside the platform.
Stage 3: Output Data. The entire reason for the complexity and expense of the big data platform is being able to run meaningful analytics across massive data volumes and different types of data. These analytics output results to applications, reports, and dashboards. This extremely valuable intelligence makes for a rich target for intrusion, and it is critical to encrypt output as well as ingress. Also, secure compliance at this stage: make certain that results going out to end-users do not contain regulated data.
One of challenges of Big Data security is that data is routed through a circuitous path, and in theory could be vulnerable at more than one point.
Big Data Security Challenges
There are several challenges to securing big data that can compromise its security. Keep in mind that these challenges are by no means limited to on-premise big data platforms. They also pertain to the cloud. When you host your big data platform in the cloud, take nothing for granted. Work closely with your provider to overcome these same challenges with strong security service level agreements.
Typical Challenges to Securing Big Data:
- Advanced analytic tools for unstructured big data and nonrelational databases (NoSQL) are newer technologies in active development. It can be difficult for security software and processes to protect these new toolsets.
- Mature security tools effectively protect data ingress and storage. However, they may not have the same impact on data output from multiple analytics tools to multiple locations.
- Big data administrators may decide to mine data without permission or notification. Whether the motivation is curiosity or criminal profit, your security tools need to monitor and alert on suspicious access no matter where it comes from.
- The sheer size of a big data installation, terabytes to petabytes large, is too big for routine security audits. And because most big data platforms are cluster-based, this introduces multiple vulnerabilities across multiple nodes and servers.
- If the big data owner does not regularly update security for the environment, they are at risk of data loss and exposure.
Big Data Security Technologies
None of these big data security tools are new. What is new is their scalability and the ability to secure multiple types of data in different stages.
- Encryption: Your encryption tools need to secure data in-transit and at-rest, and they need to do it across massive data volumes. Encryption also needs to operate on many different types of data, both user- and machine-generated. Encryption tools also need to work with different analytics toolsets and their output data, and on common big data storage formats including relational database management systems (RDBMS), non-relational databases like NoSQL, and specialized filesystems such as Hadoop Distributed File System (HDFS).
- Centralized Key Management: Centralized key management has been a security best practice for many years. It applies just as strongly in big data environments, especially those with wide geographical distribution. Best practices include policy-driven automation, logging, on-demand key delivery, and abstracting key management from key usage.
- User Access Control: User access control may be the most basic network security tool, but many companies practice minimal control because the management overhead can be so high. This is dangerous enough at the network level, and can be disastrous for the big data platform. Strong user access control requires a policy-based approach that automates access based on user and role-based settings. Policy driven automation manages complex user control levels, such as multiple administrator settings that protect the big data platform against inside attack.
- Intrusion Detection and Prevention: Intrusion detection and prevention systems are security workhorses. This does not make them any less valuable to the big data platform. Big data’s value and distributed architecture lends itself to intrusion attempts. IPS enables security admins to protect the big data platform from intrusion, and should an intrusion succeed, IDS quarantine the intrusion before it does significant damage.
- Physical Security: Don’t ignore physical security. Build it in when you deploy your big data platform in your own data center, or carefully do due diligence around your cloud provider’s data center security. Physical security systems can deny data center access to strangers or to staff members who have no business being in sensitive areas. Video surveillance and security logs will do the same.
Big Data Security Companies
Digital security is a huge field with thousands of vendors. Big data security is a considerably smaller sector given its high technical challenges and scalability requirements. However, big data owners are willing and able to spend money to secure the valuable employments, and vendors are responding. Below are a few representative big data security companies.
- Thales (Vormetric): Vormetric Data Security Platform offers security controls and encryption across all three stages of data and big data platforms: incoming data, stored data, and results output. Its technologies include encryption, key management, and access control. It also audits and reports for governance and compliance purposes.
- Cloudwick: Cloudwick Data Analytics Platform (CDAP) builds on Intel Xeon and Cloudera's Hadoop distribution. CDAP is a managed data security hub that aggregates security features from multiple from multiple analytics toolsets, machine learning projects, and traditional IDS and IPS.
- IBM: IBM Security Guardium monitors security and compliance and big data and NoSQL environments. It includes sensitive data discovery and classification, vulnerability assessment, and data and file monitoring. Guardium also masks, encrypts, blocks, alerts, and quarantines suspicious access attempts.
- Logtrust: Logtrust partnered with Panda Security to provide the Advanced Reporting Tool (ART) and Panda Adaptive Defense. ART automatically reports attacks and suspicious digital behaviors and detects internal threats to big data systems and networks. Panda Adaptive Defense correlates data from multiple sources, which is critical in big data environments with multiple nodes and data sources.
- Gemalto: Gemalto SafeNet protects big data platforms in the cloud, data center, and virtual environments. The toolset includes strong authentication and digital signing solutions, data-at-rest and in-motion encryption, and cryptographic key security and management. Gemalto integrates with leading big data providers including MongoDB, Cloudera, Couchbase, DataStax, Hortonworks, IBM, and Zettaset.
Who Is Responsible for Big Data Security?
A big data deployment crosses multiple business units. IT, database administrators, programmers, quality testers, InfoSec, compliance officers, and business units are all responsible in some way for the big data deployment. Who is responsible for securing big data?
The answer is everyone. IT and InfoSec are responsible for policies, procedures, and security software that effectively protect the big data deployment against malware and unauthorized user access. Compliance officers must work closely with this team to protect compliance, such as automatically stripping credit card numbers from results sent to a quality control team. DBAs should work closely with IT and InfoSec to safeguard their databases.
Finally, end-users are just as responsible for protecting company data. Ironically, even though many companies use their big data platform to detect intrusion anomalies, that big data platform is just as vulnerable to malware and intrusion as any stored data. One of the simplest ways for attackers to infiltrate networks including big data platforms is simple email. Although most users will know to delete the usual awkward attempts from Nigerian princes and fake FedEx shipments, some phishing attacks are extremely sophisticated. When you are administering security for your big data platform – or you are an end-user combing through your email -- never ignore the power of a lowly email.
Secure your big data platform from high threats and low, and it will serve your business well for many years.