After viewing car deals online, Google begins to present car commercials to you. Cousera, an online learning website, automatically collects data about your typing patterns and uses it to identify you. Whether you like it or not, the food you like to eat, places you often go, and other aspects of your life can be easily predicted with your data. And firms and institutions are constantly collecting data to improve their services.
Further, our daily communication and data stored on our devices are also at risk. The NSA’s mass surveillance program illegally monitored our telephony metadata and this act substantially lowered citizens’ trust on governments. Governments have to find the balance between the security and privacy to restore the lost trust. In George Orwell’s novel Nineteen Eighty-Four, the thought police discover people who have “illegal” thoughts by constant surveillance. It successfully creates a safe environment for citizens and authorities themselves, but the cost is that people lost their individuality ever since. Afterall, people have the need of protecting their privacy because it enables them to maintain their individuality. We need the government because we want to secure our rights, not the other way around. When a government has too much power, the horror of “Big brother is watching you” may become a reality.
In 2016, the FBI tried to make Apple let them bypass their security system. Many people oppose this act because the FBI will have the ability to unlock every iPhone in the world. It also reflects the fact that citizens do not trust their government on securing their data privacy anymore. Nowadays, both firms and governments are interested in our data. Firms use it to put advertisements to specific targets, predict market outcome and gain revenues; governments monitor our communications to identify potential terrorists and threats. Then, can we still have data privacy in this information age? In order to answer this question, we need to investigate the definition of this term first.
What Is Data Privacy?
Data privacy includes the data that contains personally identifiable information and other sensitive information. Personally identifiable information (PII) refers to the data that can be traced back to individuals. In 2006, Netflix released 100 million movie ratings to improve their movie recommendation algorithm. Although they removed personally identifiable information (user IDs etc.), 16 days later, two professors from the University of Texas still identified some of the users by a technique called record linkage – they compared Netflix’s data with external information (such as IMDB reviews) and found the interceptions. Users have the right to prevent their private information from publishing, but publishing a data set is not a violation of users’ privacy if it cannot trace back to individuals. In the Netflix case, however, the researchers are able to use the record linkage technology to identify specific users’ movie viewing activities, which suggests that Netflix accidentally exposed users’ private movie viewing histories and violated users’ data privacy.
The data being collected by firms and governments can be divided into three categories – explicit data, implicit data, and external data. Explicit data is what we choose to share with institutions, such as preferences and personally identifiable information; Implicit data is the information gathered through analyzing users’ behaviors, and external data is obtained from other agencies. Alessandro Acquisti and Ralph Gross conducted a research about predicting social security numbers from public data. And they are able to match the first 5 digits for 44% individuals born after 1988 at the first attempt. This study result is very astonishing, which indicates that the explicit data we share with the outside world can reveal not only our lifestyles but also sensitive information regarding our identities.
How to Protect Our Data Privacy?
Many firms only collect the explicit data that we voluntarily give out to them. And the priority for them is to improve and perfect their firewall. The cause of a large portion of data breaches is database attacks. In this spring, hackers stole half the US population’s social security numbers from Equifax. Companies must improve their protections for sensitive user data to prevent data breaches.
And firms like Netflix only want to improve their algorithm instead of exposing users’ data. Therefore, for institutions that intend to publish datasets, removing personally identifiable information is not enough. The published dataset has re-identification probabilities. For instance, a hospital published a data set indicating that the 1 of the 10 patients visited this hospital today has AIDS. If a person can find out the health status of 9 people, then this individual can infer whether the other one has AIDS or not. So public data sets may expose users’ privacy, just as shown in the Netflix case. Is there a way to utilize data without exposing privacy. That’s why the idea of differential privacy has been constantly brought out. Differential privacy is a process of sanitizing data – removing partial data without hurting the overall accuracy. And Cynthia Dwork and her team have already come out with the mathematical model of differential privacy. The differential privacy is essentially about adding the uncertainty into a dataset- a distraction that even an individual can find out 9 people’s health status, he/she is still not able to determine whether the other one has AIDS or not. This technology can eliminate the re-identification probabilities and protect the data privacy of individuals being included in the dataset.
Governments should also respect our data privacy and restrict companies’ use of user data. For instance, the Google street view cars case arose the attention of the general public in 2013. Google’s street view cars collected passwords, emails, and other sensitive information and they only need to pay $7 million to settle this case. Compared to their 100-million annual revenues, this fine is not able to prevent such a violation of privacy from happening again. The governments should introduce more laws and regulations regarding privacy problems and raise the cost for firms to violate citizens’ data privacy.
The concept of big data is still a relatively new thing. The primary goal for firms and governments using our data is to improve the effectivity of their services. Protecting the data privacy via individuals’ efforts is near impossible. Institutions, governments, and citizens themselves all have to put efforts in it. We still have a long way to go before making data privacy as secured as property rights, but it is a sure thing that we will eventually have our data privacy back in this information age.