Tuesday, April 23, 2024

Fear of Big Data

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

As with any trend that comes on too fast and too strong, Big Data is experiencing pushback. There are four neat categories most of these critiques fall into.

First, a good deal of the commentary is predictable skepticism expressed by the lightly informed, such as David Brooks’ “What Data Can’t Do” column in The New York Times.

The second category of critiques focus on semantics, arguing that Big Data is really about analytics, not the size of the data, or that Big Data isn’t really all that big, etc.

Fine, but who cares? Maybe if the marketers and analysts had favored the term “data consolidation,” this line of thinking would have never materialized in the first place.

A third type of critique, such as that voiced by President Obama’s 2012 campaign CTO, Harper Reed, who famously quipped that “Big Data is bull****,” seems to fall under the heading of “smart people spouting bull**** in order to stir up controversy.”

Mr. Reed is correct when he complains about the marketing and PR spin surrounding Big Data, but when is there not? In a world where pre-product startups regularly refer to themselves as “leading providers,” tuning out this nonsense noise is just the price of admission.

Finally, there is the fourth category of Big Data pushback, and this one merits attention: fear.

NSA overreach makes tinfoil-hat paranoia sound sane

The latest NSA Big-Data related leak involves license plates. Yep, license plates. Cities all over the U.S. (and the world) have been deploying license plate scanners to catch people running red lights, to prevent toll road cheats, and to study traffic patterns.

The most recent revelation from the Snowden treasure trove revealed that the Department of Homeland Security (DHS) wants to build a nationwide database that collects information from pretty much every license-plate reader it can find.

In other words, rather than just monitoring citizens’ communications and tracking our mobile phone GPS coordinates, the government now wants to track us as we travel by car – which should outrage anyone even vaguely concerned about privacy.

Facing a torrent of criticism, DHS recently withdrew its request, but critics point out that the media is getting this story wrong, with naïve journalists believing that the database has been scuttled, which is not true. Such a database actually already exists, run by private company Vigilant Solutions. The DHS’ recent proposal was just to gain broader access rights to use it.

So, the U.S. government has a gigantic database containing information about me that should be private, a database that is accessed often, by a number of agencies, with little oversight? Where have I heard that before? Oh, right, this is the common theme in all of the Snowden leaks.

Obviously, this sort of Big Data pushback has merit. Lots of it. But fears about the loss of privacy aren’t just limited to the government.

“Every person’s data footprint grows each time they use the Internet, a mobile phone, an iPad, or even a landline phone,” said Andy Rusnak, Americas Enterprise Intelligence Leader at EY (formerly Ernst & Young). “It also grows when people drive through a toll lane with an EZ Pass, swipe a credit card, or turn the channel on their TV. And today, face recognition technology exists to expand people’s data footprint when they walk into a store – even if they don’t have a mobile device with them.”

When you consider all the corporate and government entities that want to track us in granular detail, it makes you wonder whether privacy is a thing of the past. How can individual citizens ever reclaim their privacy when they have so little of it left?

There’s no easy answer. Regulations protect certain key data points about us, such as health information, but as data analysis tools evolve, will that matter? Facebook can already predict with eerie accuracy whether or not romantic relationships will endure, so it’s no big stretch to imagine that if you, say, contract malaria on your trip to Guinea-Bissau, Facebook or Google will certainly know it.

Rusnak believes that corporations should be proactive about protecting privacy, embracing a do-unto-others policy. “If you focus your use of Big Data on making your customer’s life better, then your use of Big Data stays true to the social contract inherent in the golden rule,” he said. “If you focus your Big Data efforts on things that solely benefit your company and not the public or your customers, then you’ve broken the social contract and have probably gone too far in using Big Data.”

Rise of the machines

I think another factor driving the fear of Big Data is the rise of automation and robotics. Workers are, yet again, grappling with the fact that many things we do today will be done by machines tomorrow. And the parts of the job left for humans will tend to be the most error-prone elements in the production chain.

North American workers have, for years, feared that their jobs would move to India or China or some other developing country, and those fears have been borne out. Now, however, workers in India, China, etc. worry that the jobs they gained through outsourcing will be snatched away by robots.

This isn’t always a bad thing. My Roomba does nothing other than save me (and my wife) time. Robots and automation are good things, taking over mundane, repetitive tasks, and reducing errors as they do. But they’re not job creators.

In this age of an eroding middle class and sky-high inequality, fears about losing any sort of foothold in the economy, no matter how tenuous it may be, are valid.

The fear of machines dovetails with fears about Big Data. Big Data should prove to be a boon to Artificial Intelligence, creating smarter and smarter robots (and computer systems), and AI will pour more data points into more and more applications. As a result, knowledge workers will, once again, have to rethink how they fit into the digital economy. This doesn’t mean we’ll be displaced (although many of us probably will), but that we’ll need to replace our easily automated skills with ones that machines just can’t accomplish.

Fear of being wrong

At this early stage of AI and robotics, machines still only do what we tell them to. Humans retain the controls, which is a mixed blessing, since humans are so error-prone. If we fall into the trap of comparing ourselves to automation, we come up short – but only because it’s a false comparison.

In the field of data analytics, humans make mistakes all the time, but they also add a level of knowledge and awareness that no robot will be able to match for the foreseeable future.

“Scientists using the same dataset and the same tools may come to different conclusions when analyzing a problem. Sometimes the answer may be open to interpretation, but in many cases, one of the data scientists’ analyses will just be plain wrong,” said Sandy Steier, CEO of 1010data, a provider of Big Data discovery and sharing tools. “Further, even when the data scientist has done everything correctly given the hard data, there are often additional, less tangible factors that need to be considered, the kind of factors that experienced business people just know.”

That last insight, “factors that experienced business people just know,” is one that I suspect will haunt Big Data for a good while. There’s a ton of truth to the “just knowing” of experts. I’ve been a journalist for a couple of decades now, and there are things I just know about storytelling, connections I’ve made and experiences I’ve collected that inform my knowledge on a subconscious level.

The flip side of that is that there are many things we think we “just know,” which we’re completely wrong about. Big Data is very good at exposing these things. After all, in Big Data’s origin story, Moneyball, Oakland A’s GM Billy Beane wondered why scouts just knew that he’d be a pro ballplayer. It was because he simply looked like a ball player. His frame, gate, arm strength, and demeanor all told scouts he’d be a major league star. He wasn’t. When Beane later rose to become the GM of the A’s, he finally asked himself whether he had failed to live up to his abilities, or whether the scouts were just wrong. He turned to data analysis to find out. The answer: the scouts’ assessments were hopelessly clouded by bias. Too often they relied on anecdotes in place of measurable evidence.

Data makes those biases abundantly clear, and, perhaps, that’s what people fear the most about Big Data: it has the potential to show us where we’re wrong and will have the evidence to prove it. No one likes being told they’re wrong, but in the Big Data age, we better start getting used to it.


Jeff Vance is a technology journlist based in Santa Monica, California. Connect with him on LinkedIn, follow him on Twitter @JWVance, or add him to your cloud computing circle on Google Plus.

Photo courtesy of Shutterstock.

Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Latest Articles