We think that when healthcare data is aggregated, it is anonymous. Thus our privacy is protected. But a doctor (and philosopher) who surveys the current scene suggests that we are forgetting something: We talk about ourselves too much for anonymity. It matters in healthcare but not only healthcare is affected:
In what is now a classic study, researchers used de-identified credit card data for 1.1 million people, in 10,000 stores over a three-month period. Using just four pieces of “outside” data they could identify 90% of the shoppers. If the aggregation of the data is expanded, the categories are said to be more coarse. For example, rather than looking at each store as an individual site, we group them into Walmarts and Targets we make reidentification harder, but not impossible. The researchers found that coarser categories reduced unicity slowly and that reidentification only requires a few more outside data points. For this shopping data, knowing that your target was a woman increased the ability to identify them by 20% compared to men. Bottom line, given computational resources, and the plethora of our behavioral data which we freely provide through our transactions and phones, de-identification of data is often possible despite whatever safeguards we believe we have put in place. Chuck Dinerstein, “Privacy Is An Increasing Illusion” at American council on Science and Health
Here’s the paper. (open access)
Dr. Dinerstein believes that gathering large amounts of data from individual patients (metadata) can be very helpful but that research on mechanisms to ensure privacy should be supplemented by “more quantitative assessment” of how easily anonymized individuals can be re-identified in the event of leaks.