Anonymous Data Collection Is More Risky Than Users Think
MIT researchers have found that it's relatively easy to identify users through "anonymous" data collection.
January 4, 2019
Without much thought, users frequently sign away rights to supposedly anonymous data collected by businesses and other organizations, with the idea that the information will ultimately improve their services and online experience. But a new study from MIT suggests that "anonymous" data can be woven together using multiple sources and is far more identifiable than most people realize.
The researchers used logs from a mobile network operator and timestamps from a public transportation system in Singapore to match individuals. With a week’s worth of anonymous data collection, the researchers could match the cell phone logs and trip timestamps to a unique user about 17 percent of the time. With a month’s data, a person could be identified about 55 percent of the time. In 11 weeks, users could be identified 95 percent of the time.
The researchers also said they could easily speed up identification of users by adding another stream of data. With less than a week’s data, the researchers estimated they could identify 95 percent of people by using the sort of GPS location data that’s regularly collected by smartphone apps.
"As researchers, we believe that working with large-scale datasets can allow discovering unprecedented insights about human society and mobility, allowing us to plan cities better,“ said Daniel Kondor, a postdoc at the Singapore-MIT Alliance for Research and Technology, in a statement. “Nevertheless, it is important to show if identification is possible, so people can be aware of potential risks of sharing mobility data.”
And though the amount of data analyzed was large--485 million records from more than 2 million users--it only took a few trips a day to identify the travelers.
"I was at Sentosa Island in Singapore two days ago, came to the Dubai airport yesterday, and am on Jumeirah Beach in Dubai today,” said Carlo Ratti, a co-author of the study and urban studies professor at MIT. “It's highly unlikely another person's trajectory looks exactly the same. In short, if someone has my anonymized credit card information, and perhaps my open location data from Twitter, they could then deanonymize my credit card data. All data with location stamps--which is most of today's collected data--is potentially very sensitive, and we should all make more informed decisions on who we share it with."
If an app is asking for your permission to use location data, Kondor suggests asking yourself two questions: Does the app really need to know where you to do its job? If it’s a taxi service, the answer might be yes; for a chat program, not so much. And is there another comparable product or service that doesn’t require your location, even if it requires an extra step?
“This effort could be, for example, opening a web browser and searching for information that the app would provide, or using a weather app that displays information in selected user-cities instead of choosing location automatically based on access to more precise GPS-based location. In the future, we anticipate that device manufacturers could make more fine-grained control over location sharing easier,” Kondor said. “User awareness and demand would help convince both device manufacturers and app developers to adopt these better practices.”
About the Author
You May Also Like