Big Data vs. Privacy: Striking a Balance

Gleaning the best insights from Big Data requires oceans of data, and sometimes privacy becomes an issue. So then what?

Apple's secretive nature is legendary. Though key part of its history, all the various projects would be highly compartmentalized and no one knew what other groups did. Employees working on different projects would refuse to sit together in the campus cafeteria for fear of being accused of sharing details on their projects.

And this secretiveness extends to data collection from the iPhone. Apple, like Google, wants to improve on machine learning, particularly as it extends to Siri, but according to a recent report from Reuters, its strict control over data collected by the iPhone is hampering the ability of data scientists to get anything done.

Machine learning experts who want unfettered access to data tend to shy away from jobs at Apple, former employees told Reuters. Apple's data retention on user-centric information gathered by Siri is six months, while information from Apple Maps expires after only 15 minutes. So it's rather difficult to gather data from iPhone's using the Maps function.

This gives Google and even Microsoft's Cortana an edge in spotting larger trends and – to the extent this one metric is a factor – Apple's predictions may be further from precise.

In a way, Apple should be applauded. It analyzes its users' behavior under some very strict self-imposed constraints to better protect the data from outsiders. But it is leaving Apple data scientists with less data, which means they can't do their job as well.

One Word: Trust

It's a problem that other companies may face if they don't strike a balance between analytics and privacy. After monster breaches as Home Depot, Target, Anthem Blue Cross, UCLA Health and Community Health System, people are understandably edgy about the security of their personal information.

Privacy is considered sacrosanct, but it also has its price, notes Tim M. Crawford, CIO Strategic Advisor

and president of his consultancy AVOA. "Forget privacy for a second. If we all took our medical records and diagnostics data where we took this pill for this symptom and what result we got, if we took all the data and could compile it, imagine how much further we'd be because it would be a science because of all the data points. But we are apprehensive to do something like that because we have things like HIPPA," he said.

However, there is a flip side to that argument, but it requires a great deal of trust, said Frank Buytendijk, research vice president and distinguished analyst with Gartner. "There is a case in Denmark where the government inadvertently stored too much health information. In their habit of being transparent, they were open about it and said they would delete it. The general public, trusting their government, pleaded for the opposite. They said 'Keep it! It could help in healthcare'," he said.

This could never happen in the U.S., in part because of so many past breaches have shredded confidence in patient privacy and also because many Americans are less trusting of their government, with more than a few reasons why.

But building trust is the next big challenge, said Crawford. "Culturally, how do we get comfortable with data and how data is used? There is a direct relationship between that statement and trust. So if I trust that Apple will only use this data for their purposes to make Siri better, then that might be okay. But having Apple sell the data and Apple benefitting financially or making it publicly available and potentially compromising my behavior, that's where you lost trust," he said.

Mark Thiele, executive vice president of ecosystem evangelism at data center provider SuperNAP, said trust is a core tenant to making the most of people's personal information for data mining and business intelligence.

"[Companies] need to build trust with their customer base over the data they are custodians of, and they do that by leveraging data in appropriate ways and not abusing it, and taking great care with how they protect it. As soon as you violate that trust you are done. Look at the data breaches and the results we've had," he said. Companies have to figure out on their own how they become a good custodian of the data.

This is not something that can be rushed, either. "It has to happen over time because trust has to play a significant role and the culture has to change as well. More times than not the data is about individuals and behaviors, we have to be comfortable sharing that data. Also we have to have trust in those who are storing and leveraging that data. So the company has a responsibility but so does the individual," said Thiele.

More Instances?

Thiele thinks there will be more incidences such as Apple's challenge, but it's more dependent on the culture of the organization. "It goes back to what the company stands for and what they hold valuable and how they leverage it. Privacy is a core tenant for Apple. Companies that follow along those lines will follow along what Apple does," he said.

Buytendijk said Gartner has some stats to back this up. He said 59% of respondents in Gartner’s CIO Agenda Survey 2015 said that they are already experiencing digital ethical dilemmas, most prominently around privacy and security.

"At the same time, from information surveys we have learned that around 70% of people indicate that in their organizations there is no logical moment or logical place to raise these digital ethical dilemmas," he added.

The issue, however, is not new, nor is it related to the advent of Big Data, where more and more data than ever is being collected. "Big Data just shows that there are more sources of data, which means the value of the data can only increase," said Crawford. "The more data points you get more clarity on the problem you're trying to solve. I don't think Big Data itself is a leading indicator or the reason for this problem. Big Data and unstructured data is just another data point."

But Buytendijk  disagrees. "Even if you apply all kinds of masking, Big Data has certainly complicated things," he said. "If you know someone’s gender, age and zip code, this is enough already to re-identify the vast majority of people. Big Data most often adds all kinds of contextual information that makes it harder to be anonymous."

There is a class of technology out there, called dynamic data masking, which replaces identifiable fields with meaningless but consistent codes. So the data can still be used for all types of analysis, and will show the same results, just some fields are meaningless. Once put into action, you can change the information back to being meaningful, only in those cases where needed.

However, that's not an ideal solution. He cited Georgetown University Professor of Law Paul Ohm’s maxim about privacy, which states "Every perfectly anonymous data set is perfectly unusable." "The more personal identifiable information you strip, the less opportunity the data gives to provide value for individual customers or individuals," he said.

Regardless, Thiele thinks it will happen more often among companies that view privacy as a core tenant, as Apple does. He believes Apple's findings from Siri and Maps data may be the motivation for its strict policies.

"The reality is when you understand how people are using data, how you want to use it is irrelevant," said Thiele. "Once you start to expose trends, you might start to expose data you'd rather not know. I'm sure they have some analytics, like the most common words used with Siri. That could be an indicator as to why they are taking such a hard stance."

Crawford said he thinks it will come down to the different markets. "Agencies have an increasing amount of data. I think they swing toward being overly protective of data. Retail tends to swing the other direction. They tend to be looser about info. They share information on loyalty cards and ad campaigns and there's a lot of behavioral data that comes with it," he said.

Photo courtesy of Shutterstock.

Tags: Google, Microsoft, Apple, Privacy policy, big data, Data Analytics

0 Comments (click to add your comment)
Comment and Contribute


(Maximum characters: 1200). You have characters left.