Cyber-security and cyber-criminals are engaged in a constant arms race. The minute a software vulnerability gets patched or a security tool is able to block a class of attacks, malware writers shift gears and look for something new to exploit.
For years, security professionals have strived to get ahead of attackers. But the status quo, unfortunately, is that security is reactive, and it's hard to imagine how it could be otherwise. How can you block cyber-crooks, after all, until you know what they're up to? We don't live in a Minority Report type of world, after all, where psychics can help us sniff out crimes before they happen.
That doesn't mean we don't try. In the physical world, the desire to get ahead of crime has led to all sorts of dubious practices, everything from stop-and-frisk to NSA snooping. In cyber-space, security professionals are now turning to big data to try to discover patterns that may indicate a crime is coming, even if it has not yet occurred.
This is the crux of the NSA surveillance controversy, after all.
Less invasive is the new Pleiades tool developed by researchers at Georgia Tech, the University of Georgia and security startup Damballa. Pleiades doesn't intuit coming crimes, but it can identify zero-day attacks before security researchers even know what exactly the malware is.
Pleiades monitors network traffic for specific patterns of behavior common to malware. Recently, it identified several attacks based on command-and-control (C&C) calls routed to Non-Existent Domains (NXDomains). NXDomains are important because infected devices communicate periodically with C&C servers in order to get instructions that tell it to, for instance, launch a DDoS attack or send out spam.
Legacy security tools block these types of botnets through blacklists of known C&C domains. Of course, once attackers learned this, they shifted tactics and started to rely on Domain Generation Algorithms (DGAs), which dynamically produce a large number of random domain names. The latest version of Conficker, for instance, generates as many as 50,000 NXDomains per day.
A small subset of those domains are used for C&C, each of which is used for a very short period of time, making the C&C domain associated to the malware a hard-to-find moving target.
What the researchers did was use machine learning to help the Pleiades separate benign traffic anomalies (i.e., a bunch of people typing in Facbook.com, a common typo, not a malware signature) from suspicious ones, such as a large number of end devices connecting to a large number of NXDomains.
"We are able to correlate network patterns with other information we have, so we don't have to reverse engineer the malware or even know what it is," said Marshall Bockrath-Vandegrift, engineering lead on Damballa's R&D team.
Does big data tip the scales in favor of security vendors?
It would seem to. While it's easy to imagine malware writers trying to mine big data insights from infected devices, it would be harder for them to see the forest for the trees. "To conduct this type of analysis, you have to have access to the network data," Bockrath-Vandegrift said. ISPs and major enterprises aren't going to hand over their data to cyber-crooks.
However, could criminals shift their sites, targeting, say, Hadoop databases?
"I suppose it's possible, but exfiltrating terabytes of data isn't easy," Bockrath-Vandegrift said.
That doesn't mean cyber-crooks will sit on their hands and accept this new reality. In fact, the Gameover variant of the ZeuS banking trojan didn't rely on standard C&C communications, instead taking advantage of peer-to-peer communications, which is much harder to detect.
"What no one realized, though, and what we were able to detect, is that Gameover has a backup command-and-control that uses DGAs," Bockrath-Vandegrift said. "We were able to analyze network data to identify pockets of behavior that we could correlate with known ZeuS command-and-control points, and we were able to learn something about this Trojan that other security experts missed when they reverse engineered it."
Pleiades has been flexing its muscles lately. Researchers at Georgia Tech, Damballa and Secureworks recently used it to discover a new form of the trojan Pushdo almost three months before the actual malware was discovered and publicized by major antivirus vendors.
Pleiades was also used to uncover the Flashback malware, which ultimately infected more than 600,000 Macintosh devices. Pleiades discovered this zero-day attack weeks before the malware was first discovered and announced by the security community.
Fending off bad guys or spying on everyone?
Of course, for all of the good big data could do, there is also a dark side. Governments, ISPs and tech companies can all exploit big data to invade our privacy. The NSA surveillance scandal is only the latest example of this fact.
Of course, the NSA will argue that it collected all of our metadata to ferret out terrorists. Critics will point out that the potential for abuse is just too great to allow this sort of data collection to go on unchecked. And the latest headlines are proving the critics right, with revelations that the NSA spied on US citizens, the UN and our allies.
Meanwhile, reports are coming out of individual analysts using NSA surveillance capabilities to track former spouses and spy on love interests.
So, the same metadata patterns that could alert an analyst to the fact that a suspect is helping to coordinate a terrorist attack could be used to help a suspicious analyst figure out whether his or her spouse is cheating.
Government snooping has become such a problem that Facebook recently released a report that trumpeted the fact that it denied more government requests for user data than Google did. Never mind the fact that Facebook handed over user data 79 percent of the time when the US government was the requesting party. That's still better than Google, which turned over data 88 percent of the time when the US made the request. I guess that's one way to polish a turd, but it's not terribly convincing.
Effect on Cloud Providers
This all may seem like a mere nuisance, but NSA snooping could hurt cloud providers, big time. To meet their own privacy requirements, European companies may end up being prohibited from using US-based cloud services like Amazon and Rackspace for their big data implementations.
These companies are in a bind due to the US Patriot Act because if, say, Amazon gets subpoenaed, Amazon must turn over not just all of that data (data that can provide a very specific competitive advantage to its owner), but Amazon must also turn over the keys to unencrypt that data, as well.
"Big data in the cloud offers massive scalability and high performance, but for it to truly be embraced by the enterprise, big data solutions must also include fundamental security features like encryption and key management," said Larry Warnock, CEO of security vendor Gazzang.
To address this problem, Gazzang released a new product, CloudEncrypt, which provides data encryption and key management at every stage of the Amazon Elastic MapReduce (Amazon EMR) data lifecycle.
"Amazon EMR enables businesses, researchers, data analysts and developers to easily and cost-effectively process vast amounts of data, and it’s important to our customers that the data inputs and outputs remain secure," said Terry Wise, Head of Worldwide Partner Ecosystem, Amazon Web Services.
CloudEncrypt allows companies to keep their own keys. So, if Amazon gets subpoenaed, the data the government gets is encrypted, and the customer (not Amazon) retains full ownership of the keys.
Thus, the US government has to take that next step and subpoena the target company. The way things are now, the target company may not even know it was a target. By holding onto the keys, the company can get its lawyers on the case and decide whether or not it should fight the subpoena.