You may think you are the only person who knows where you are when you are using your phone, but a group of MIT researchers1 says, don’t kid yourself:
The researchers use a statistical model that tracks location stamps of users in both datasets and provides a probability that data points in both sets come from the same person. In experiments, the researchers found the model could match around 17 percent of individuals in one week’s worth of data, and more than 55 percent of individuals after one month of collected data. The work demonstrates an efficient, scalable way to match mobility trajectories in datasets, which can be a boon for research. But, the researchers warn, such processes can increase the possibility of deanonymizing real user data…
“In publishing the results — and, in particular, the consequences of deanonymizing data — we felt a bit like ‘white hat’ or ‘ethical’ hackers,” adds co-author Carlo Ratti, a professor of the practice in MIT’s Department of Urban Studies and Planning and director of MIT’s Senseable City Lab. “We felt that it was important to warn people about these new possibilities [of data merging] and [to consider] how we might regulate it.” Rob Matheson, “The privacy risks of compiling mobility data” at MIT News
The research demonstrated that while urban planners can use the personal data they aggregate to help resolve parking and accessibility issues, a captured set of such data could also be “deanonymized”:
While the MIT group wasn’t trying to unmask specific users in this dataset, they proved that someone acting in bad faith could merge such anonymized datasets with personal ones using the same process, easily pinning the timestamps together to figure out who was who. KELSEY CAMPBELL-DOLLAGHAN, “Sorry, your data can still be identified even if it’s anonymized” at Fast Company
A recent long article in the New York Times shows the detail of the information that is now collected, tracking one cell phone user’s day (with her permission):
Only one person makes that trip: Lisa Magrin, a 46-year-old math teacher. Her smartphone goes with her.
An app on the device gathered her location information, which was then sold without her knowledge. It recorded her whereabouts as often as every two seconds, according to a database of more than a million phones in the New York area that was reviewed by The New York Times. While Ms. Magrin’s identity was not disclosed in those records, The Times was able to easily connect her to that dot.
The app tracked her as she went to a Weight Watchers meeting and to her dermatologist’s office for a minor procedure. It followed her hiking with her dog and staying at her ex-boyfriend’s home, information she found disturbing. JENNIFER VALENTINO-DeVRIES, NATASHA SINGER, MICHAEL H. KELLER and AARON KROLIK, “Your Apps Know Where You Were Last Night, and They’re Not Keeping It Secret” at The New York Times
Just as Ms. Magrin finds the information collected on her disturbing, others may find it profitable. Weather Channel is being accused of illegally selling private data:
In its lawsuit against Time Warner Cable, filed in L.A. County Superior Court, the city attorney’s office claims that the company tracks the exact location of its Weather Channel app users, and then sells that private information to advertisers without its users’ knowledge.
“For years, TWC has deceptively used its Weather Channel App to amass its users’ private, personal geolocation data — tracking minute details about its users’ locations throughout the day and night, all the while leading users to believe that their data will only be used to provide them with ‘personalized local weather data, alerts and forecasts,’” the complaint reads. “ LA Sues Weather Channel For Illegally Selling Private Data Of Mobile App Users” at CBS Los Angeles
Your location can be a useful guide to your buying habits, whether or not you want to buy anything or think anyone has any business snooping on you to find out.
What can we do to protect our privacy? Relatively straightforward tips include opting out of ad personalization, turning off location-based ads, and making use of settings that disable, for example, motion sensors that track your body’s movements.
Some may find that the best solution is a virtual private network (VPN):
In simple terms, a VPN encrypts the network data on your computer so others–such as your ISP or someone snooping on a public Wi-Fi network you’re using–can’t read it. The VPN then routes all your encrypted internet traffic through a secure server before sending it on to the website you want to access. By doing this, it ensures that websites and other online services won’t be able to see your true IP address or know where in the world your computer is actually located; they’ll only see the location of the VPN’s server. That means your true identity, location, and what you do online is–to a large extent–concealed from prying eyes. MICHAEL GROTHAUS, “The one thing you should do to protect your privacy in 2019” at Fast Company
Grothaus recommends avoiding “free” VPN services because, as George Gilder reminds readers in Life after Google, when the communications and information service is free, that is because you are the product (that is, your data is sold). We live in a world where advertisers will pay Google and others more for our data than most of us will pay Google for their communications services. But if, on the other hand, we do choose to pay someone for privacy, we have the right to insist on it.
We could, of course, wait for scandals, exposes, invasion of privacy lawsuits, and criminal charges to bring about some reform in the marketplace. But that includes the unforeseen risks and hassles of being a party to one of the complaints.
1 Here’s their paper:
Abstract: The problem of unicity and reidentifiability of records in large-scale databases has been studied in different contexts and approaches, with focus on preserving privacy or matching records from different data sources. With an increasing number of service providers nowadays routinely collecting location traces of their users on unprecedented scales, there is a pronounced interest in the possibility of matching records and datasets based on spatial trajectories. Extending previous work on reidentifiability of spatial data and trajectory matching, we present the first large-scale analysis of user matchability in real mobility datasets on realistic scales, i.e. among two datasets that consist of several million people’s mobility traces, coming from a mobile network operator and transportation smart card usage. We extract the relevant statistical properties which influence the matching process and analyze their impact on the matchability of users. We show that for individuals with typical activity in the transportation system (those making 3-4 trips per day on average), a matching algorithm based on the co-occurrence of their activities is expected to achieve a 16.8% success only after a one-week long observation of their mobility traces, and over 55% after four weeks. We show that the main determinant of matchability is the expected number of co-occurring records in the two datasets. Finally, we discuss different scenarios in terms of data collection frequency and give estimates of matchability over time. We show that with higher frequency data collection becoming more common, we can expect much higher success rates in even shorter intervals. (paywall) Daniel Kondor ; Behrooz Hashemian ; Yves-Alexandre de Montjoye, Towards matching user mobility traces in large-scale dataset, Towards matching user mobility traces in large-scale dataset, 24 September 2018, 10.1109/TBDATA.2018.2871693 More.
See also: Our anonymity may be an illusion Because we talk about ourselves so much online, few leaked pieces may even be required to identify us.
The $60 billion-dollar medical data market is coming under scrutiny As a patient, you do not own the data and are not as anonymous as you think