Reporting Fail: The Reidentification of Personal Genome Project Participants

The issue of how easy – or difficult – it might be to re-identify “de-identified” data is crucial to discussions of using PHI in research. Jane Yakowitz writes:

Last week, a Forbes article by Adam Tanner announced that a research team led by Latanya Sweeney had re-identified “more than 40% of a sample of anonymous participants” in Harvard’s Personal Genome Project. Sweeney is a progenitor of demonstration attack research. Her research was extremely influential during the design of HIPAA, and I have both praised and criticized her work before.

Right off the bat, Tanner’s article is misleading. From the headline, a reader would assume that research participants were re-identified using their genetic sequence. And the “40% of a sample” line suggests that Sweeney had re-identified 40% of arandom sample. Neither of these assumptions is correct. Even using the words “re-identified” and “anonymous” is improvident. Yet the misinformation has proliferated, with rounding up to “nearly half” or “97%.”

Here’s what actually happened:

Related: