SHARE

Big Data vs. Privacy: Striking a Balance

Apple’s secretive nature is legendary. Though key part of its history, all the various projects would be highly compartmentalized and no one knew what other groups did. Employees working on different projects would refuse to sit together in the campus cafeteria for fear of being accused of sharing details on their projects. And this secretiveness […]

Written By

AP

Andy Patrizio

Oct 27, 2015

7 minute read

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Apple’s secretive nature is legendary. Though key part of its history, all the various projects would be highly compartmentalized and no one knew what other groups did. Employees working on different projects would refuse to sit together in the campus cafeteria for fear of being accused of sharing details on their projects.

And this secretiveness extends to data collection from the iPhone. Apple, like Google, wants to improve on machine learning, particularly as it extends to Siri, but according to a recent report from Reuters, its strict control over data collected by the iPhone is hampering the ability of data scientists to get anything done.

Machine learning experts who want unfettered access to data tend to shy away from jobs at Apple, former employees told Reuters. Apple’s data retention on user-centric information gathered by Siri is six months, while information from Apple Maps expires after only 15 minutes. So it’s rather difficult to gather data from iPhone’s using the Maps function.

This gives Google and even Microsoft’s Cortana an edge in spotting larger trends and – to the extent this one metric is a factor – Apple’s predictions may be further from precise.

In a way, Apple should be applauded. It analyzes its users’ behavior under some very strict self-imposed constraints to better protect the data from outsiders. But it is leaving Apple data scientists with less data, which means they can’t do their job as well.

One Word: Trust

It’s a problem that other companies may face if they don’t strike a balance between analytics and privacy. After monster breaches as Home Depot, Target, Anthem Blue Cross, UCLA Health and Community Health System, people are understandably edgy about the security of their personal information.

Privacy is considered sacrosanct, but it also has its price, notes Tim M. Crawford, CIO Strategic Advisor

and president of his consultancy AVOA. “Forget privacy for a second. If we all took our medical records and diagnostics data where we took this pill for this symptom and what result we got, if we took all the data and could compile it, imagine how much further we’d be because it would be a science because of all the data points. But we are apprehensive to do something like that because we have things like HIPPA,” he said.

However, there is a flip side to that argument, but it requires a great deal of trust, said Frank Buytendijk, research vice president and distinguished analyst with Gartner. “There is a case in Denmark where the government inadvertently stored too much health information. In their habit of being transparent, they were open about it and said they would delete it. The general public, trusting their government, pleaded for the opposite. They said ‘Keep it! It could help in healthcare’,” he said.

This could never happen in the U.S., in part because of so many past breaches have shredded confidence in patient privacy and also because many Americans are less trusting of their government, with more than a few reasons why.

But building trust is the next big challenge, said Crawford. “Culturally, how do we get comfortable with data and how data is used? There is a direct relationship between that statement and trust. So if I trust that Apple will only use this data for their purposes to make Siri better, then that might be okay. But having Apple sell the data and Apple benefitting financially or making it publicly available and potentially compromising my behavior, that’s where you lost trust,” he said.

Mark Thiele, executive vice president of ecosystem evangelism at data center provider SuperNAP, said trust is a core tenant to making the most of people’s personal information for data mining and business intelligence.

“[Companies] need to build trust with their customer base over the data they are custodians of, and they do that by leveraging data in appropriate ways and not abusing it, and taking great care with how they protect it. As soon as you violate that trust you are done. Look at the data breaches and the results we’ve had,” he said. Companies have to figure out on their own how they become a good custodian of the data.

This is not something that can be rushed, either. “It has to happen over time because trust has to play a significant role and the culture has to change as well. More times than not the data is about individuals and behaviors, we have to be comfortable sharing that data. Also we have to have trust in those who are storing and leveraging that data. So the company has a responsibility but so does the individual,” said Thiele.

More Instances?

Thiele thinks there will be more incidences such as Apple’s challenge, but it’s more dependent on the culture of the organization. “It goes back to what the company stands for and what they hold valuable and how they leverage it. Privacy is a core tenant for Apple. Companies that follow along those lines will follow along what Apple does,” he said.

Buytendijk said Gartner has some stats to back this up. He said 59% of respondents in Gartner’s CIO Agenda Survey 2015 said that they are already experiencing digital ethical dilemmas, most prominently around privacy and security.

“At the same time, from information surveys we have learned that around 70% of people indicate that in their organizations there is no logical moment or logical place to raise these digital ethical dilemmas,” he added.

The issue, however, is not new, nor is it related to the advent of Big Data, where more and more data than ever is being collected. “Big Data just shows that there are more sources of data, which means the value of the data can only increase,” said Crawford. “The more data points you get more clarity on the problem you’re trying to solve. I don’t think Big Data itself is a leading indicator or the reason for this problem. Big Data and unstructured data is just another data point.”

But Buytendijk disagrees. “Even if you apply all kinds of masking, Big Data has certainly complicated things,” he said. “If you know someone’s gender, age and zip code, this is enough already to re-identify the vast majority of people. Big Data most often adds all kinds of contextual information that makes it harder to be anonymous.”

There is a class of technology out there, called dynamic data masking, which replaces identifiable fields with meaningless but consistent codes. So the data can still be used for all types of analysis, and will show the same results, just some fields are meaningless. Once put into action, you can change the information back to being meaningful, only in those cases where needed.

However, that’s not an ideal solution. He cited Georgetown University Professor of Law Paul Ohm’s maxim about privacy, which states “Every perfectly anonymous data set is perfectly unusable.” “The more personal identifiable information you strip, the less opportunity the data gives to provide value for individual customers or individuals,” he said.

Regardless, Thiele thinks it will happen more often among companies that view privacy as a core tenant, as Apple does. He believes Apple’s findings from Siri and Maps data may be the motivation for its strict policies.

“The reality is when you understand how people are using data, how you want to use it is irrelevant,” said Thiele. “Once you start to expose trends, you might start to expose data you’d rather not know. I’m sure they have some analytics, like the most common words used with Siri. That could be an indicator as to why they are taking such a hard stance.”

Crawford said he thinks it will come down to the different markets. “Agencies have an increasing amount of data. I think they swing toward being overly protective of data. Retail tends to swing the other direction. They tend to be looser about info. They share information on loyalty cards and ad campaigns and there’s a lot of behavioral data that comes with it,” he said.

Photo courtesy of Shutterstock.

Huawei’s AI Update: Things Are Moving Faster Than We Think

FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA

FEATURE | By Guest Author,
November 10, 2020
Top 10 AIOps Companies

FEATURE | By Samuel Greengard,
November 05, 2020
What is Text Analysis?

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media

FEATURE | By Rob Enderle,
October 16, 2020
Top 10 Chatbot Platforms

FEATURE | By Cynthia Harvey,
October 07, 2020
Finding a Career Path in AI

ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science

FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future

FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2020

FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI

FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality

FEATURE | By James Maguire,
September 09, 2020
Anticipating The Coming Wave Of AI Enhanced PCs

FEATURE | By Rob Enderle,
September 05, 2020
The Critical Nature Of IBM’s NLP (Natural Language Processing) Effort

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
August 14, 2020

SEE ALL
DATA CENTER ARTICLES

AP

Andy Patrizio

Andy Patrizio is a freelance journalist based in southern California who has covered the computer industry for 20 years and has built every x86 PC he’s ever owned, laptops not included.

Big Data vs. Privacy: Striking a Balance

Andy Patrizio

Company

Categories

Big Data vs. Privacy: Striking a Balance

RELATED NEWS AND ANALYSIS

Andy Patrizio

Company

Categories