Recommended article: Is Deidentification Sufficient to Protect Health Privacy in Research?

Mark A. Rothstein of the University of Louisville Institute for Bioethics, Health Policy, and Law has an article in The American Journal of Bioethics (Volume 10 Issue 9 2010), “Is Deidentification Sufficient to Protect Health Privacy in Research?” Here’s the abstract:

The revolution in health information technology has enabled the compilation and use of large data sets of health records for genomic and other research. Extensive collections of health records, especially those linked with biological specimens, are also extremely valuable for outcomes research, quality assurance, public health surveillance, and other beneficial purposes. The manipulation of large quantities of health information, however, creates substantial challenges for protecting the privacy of patients and research subjects. The strategy of choice for many health care providers and research institutions in dealing with this challenge has been to de-identify individual health information.

As regular readers of PHIprivacy.net know, I have been hammering at privacy issues raised by databases of health care information – including databases that are supposedly “deidentified.” Here’s part of what Dr. Rothstein writes about re-identification of deidentified data:

Despite using various measures to deidentify health records, it is possible to reidentify them in a surprisingly large number of cases by using computerized network databases containing voter registration records, hospital discharge records, commercially available databases, and other sources (Malin and Sweeney 2004; Sweeney 2002). Indeed, it is likely that between 63% (Golle 2006) and 87% (Sweeney 2000) of the population of the United States could be uniquely identified by using only gender, ZIP code, and date of birth. The cost of doing so, however, would vary by state, because of the different prices charged for voter registration data (Benitez and Malin 2010).

Reidentification of genomic samples in biobanks is also possible using publicly available databases, thereby raising the question of whether genetic information can ever be considered deidentified in the sense that it cannot be linked with other genetic samples (McGuire andGibbs 2006).After a scientific paper demonstrated it was theoretically possible to identify an individual’s genomic attribute data in a pooled or aggregated sample (Homer et al. 2008), the National Human Genome Research Institute immediately restricted public access to pooled genomic data.

You can download the free full-text article from SSRN.

Additional articles on privacy and de-identification can be found in the September issue of The American Journal of Bioethics.

Related: