Companies Anonymized Data May Violate GDPR, Privacy Regs

New study found that any database containing 15 pieces of demographic data could be used to identify individuals.

For more than two decades, researchers have chipped away at the assumption that anonymized collections of data could protect the identities of research subjects as long as the datasets did not include one of a score of different identifying attributes.
In the latest research highlighting the ease of what is known as re-identification, three academic researchers have shown that 99.98% of Americans could be re-identified from an otherwise anonymized dataset, if it included 15 demographic attributes.
The findings suggests that even the current policies surrounding the protection of customer identities, such as the General Data Protection Regulation (GDPR), fall short of truly protecting citizens.
In the paper, which appeared in Nature on July 23, the researchers conclude that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model.
The paper adds to the mountain of research suggesting that any dataset that contains useful information about individuals likely could be used to re-identify those subjects and link individuals to information that may be protected by privacy regulations or law. The research could lead to a rethinking of whether all big data sets need to be significantly better protected.
Many companies think that, if its anonymous, I dont need to secure it, but the data is likely not as anonymous as they think, says Bruce Schneier, a lecturer at Harvard Universitys Kennedy School of Management and the author of
Data and Goliath
, a book about how companies data collection results in a mass-surveillance infrastructure. Again and again and again, we have learned that anonymization of data is extremely hard. People are unique enough that data about them is enough to identify them.
The findings mean that companies and government agencies need to reassess how they deal with anonymized data, says Scott Giordano, vice president of data protection, Spirion, a providers of data-security services. The US Department of Health and Human Services, for example, currently requires that businesses
remove 18 different classes of information from files
, or have an expert review their anonymization techniques, to certify data as non-identifying.
That may not be enough, he says.
It is too easy, with advances in big data, to de-anonymize things that maybe you couldnt have de-anonymized five years ago, Giordano says. We are in an arms race between the desire to anonymize data and our collection of big data, and big data is winning.
Zip Code, Gender, and DoB
The concerns over re-identification appeared in the late 1990s, when then-graduate student Latanya Sweeney conducted research into the possibility of combining voter rolls and medical research records on Massachusetts state employees to de-anonymize patients information. Famously, Sweeney, now a professor of government and technology in residence at Harvard University, was able to
find then-Governor William Welds medical record in the dataset
. In a
2000 paper
, she estimated that 87% of US citizens could be identified using just three pieces of information: their 5-digit zip code, gender, and data of birth.
With the collection of a broad range of data proliferating from personal devices — not just from smartphones, but from Apple watches to connected mattresses — technology firms and data aggregators are making choices that affect the rights of US citizens, she argued in
a speech at Stanford Universitys School of Engineering in 2018
.
We live in a technocracy — that is, we live in a world in which technology design dictates the rules we live by, she said. We dont know these people, we didnt vote for them in office, there was no debate about their design, but yet, the rules that they determined by the design decisions they make — and many of them somewhat arbitrary — end up dictating how we will live our lives.
The
Nature paper
, written by a team of three researchers from the Imperial College of London and Belgiums UC Louvain, shows that the massive number of attributes collected about people makes them more unique. For companies, the lesson is that any sufficiently detailed dataset cannot be, by definition, anonymous. Even releasing partial datasets runs the risk of re-identification, the researchers found.
Moving forward, (our results) question whether current de-identification practices satisfy the anonymization standards of modern data protection laws such as GDPR and CCPA (the California Consumer Privacy Act) and emphasize the need to move, from a legal and regulatory perspective, beyond the de-identification release-and-forget model, the researchers stated in the paper.
This leaves companies with no easy answers on whether following current guidelines is enough to protect the anonymity of the information in their care, says Pravin Kothari, CEO of CipherCloud, a data-security provider.
This finding proves that re-identification is easy, so companies need to make sure they are anonymizing all demographic data, not just names, he says. The removal of names is simply not enough to properly de-identify a person. Well need to ensure that all personally identifiable information is anonymized in order to remove the risk of re-identification of individuals.
Related Content
Equifax to Pay Up to $700M for Data Breach Damages
Why Your GDPR Implementation Plan Needs CISOs & Legal Engineers to Work Together
6 Ways To Prepare For The EU’s GDPR
NIST Publishes New Security Standard For Encrypting Credit Card, Medical Info
Black Hat USA returns to Las Vegas with hands-on technical Trainings, cutting-edge Briefings, Arsenal open-source tool demonstrations, top-tier security solutions, and service providers in the Business Hall. Click for information on the
conference
and
to register.

Last News
▸ ArcSight prepares for future at user conference post HP acquisition. ◂ Discovered: 07/01/2025 Category: security	▸ Samsung Epic 4G: First To Use Media Hub ◂ Discovered: 07/01/2025 Category: security	▸ Many third-party software fails security tests ◂ Discovered: 07/01/2025 Category: security

**Cyber Security Categories**
Google Dorks Database	Exploits Vulnerability	Exploit Shellcodes

CVE List

Tools/Apps

News/Aarticles

Phishing Database

Deepfake Detection

Trends/Statistics & Live Infos

Tags:
Companies Anonymized Data May Violate GDPR, Privacy Regs