Good Luck Explaining to HHS Why Your PHI is in GitHub’s Vault for the Next 1,000 Years

You may see a number of hospitals and covered entities issuing statements this week about a data security incident involving Med-Data (Med-Data, Incorporated). So far, Memorial Hermann, U. of Chicago, Aspirus, and OSF Healthcare have posted notices. Others should be or may be posting soon. Here’s DataBreaches.net’s exclusive report on the incident.

Another Day, Another GitHub Leak?

In August, 2020, Dutch independent security researcher Jelle Ursem and DataBreaches.net published a paper describing nine data leaks found on GItHub public repositories that involved protected health information.

In November, Ursem discovered yet another developer who had compromised the security of protected health information (PHI) by exposing it in public repositories. Much of the data appeared to involve claims data (Electronic Data Interchange or EDI data). Because the data was from a number of different clinical entities and involved claims data, it appeared to be a business associate that we were looking to identify. Our investigation into the data and covered entities suggested that the firm might be Med-Data.

On December 8, DataBreaches.net reached out to the firm, but neither Ursem nor this site could seem to get anyone to respond to our attempts to alert to them to their leak.

Blocked on Twitter — *One executive who ignored DataBreaches.net’s messages on LinkedIn actually blocked Ursem on LinkedIn when he also reached out to him. Our attempts to notify Med-Data were not going well.*

On December 10, after other methods (including a voicemail to the executive who had ignored me) failed, DataBreaches.net left a voicemail for Med-Data’s counsel. She promptly called back, and from then on, we were taken seriously. Note: this blogger is the “independent journalist” Med-Data’s substitute notice mentions contacting them on December 10, although we actually contacted them beginning on December 8.

On December 14, at their request, DataBreaches.net provided Med-Data with links to the repositories that were exposing protected health information. Med-Data’s statement indicates that the repositories were removed by December 17.

DataBreaches.net initially held off on reporting the incident for a few reasons, but then, to be honest, just totally forgot about it.

So What Happened?

When Med-Data investigated the exposure on GitHub, they discovered that a former employee had saved files to personal folders in public repositories (yes, more than one repository). The improper exposure had begun no later than September, 2019, although it might have begun earlier.

On February 5, 2021, cybersecurity specialists retained by Med-Data provided them with a list of individuals whose PHI was impacted by the incident. Med-Data reports:

A review of the impacted files revealed that they contained individuals’ names, in combination with one or more of the following data elements: physical address, date of birth, Social Security number, diagnosis, condition, claim information, date of service, subscriber ID (subscriber IDs may be Social Security numbers), medical procedure codes, provider name, and health insurance policy number.

That report is consistent with what we found in the exposed data.

Med-Data notified its clients on February 8, 2021 and mailed notices to impacted patients on March 31. Their notice does not explain why it took more than 60 days for notifications to be made. Those impacted were offered mitigation services with IDX.

In response to the incident, Med-Data has taken steps to minimize the risk of a similar event happening in the future. They

“implemented additional security controls, blocked all file sharing websites, updated internal data policies and procedures, implemented a security operations center, and deployed a managed detection and response solution.”

What they do not seem to have done yet, however, is to provide a clear way to alert them to a data security concern. Neither Ursem nor DataBreaches.net could find any link or contact method to convey a security concern. They need to provide a clear way to contact them to report a security issue — and to be sure that it is monitored by someone who can evaluate or escalate the report.

But Were All the Data Really Removed?

One issue that arose — and may still not be resolved as we have received no answer to our inquiry about this — involves GitHub’s Arctic Code Vault.

GitHub Arctic Code Vault

As GitHub explains the vault: the code vault is a data repository in a very-long-term archival facility. The archive is described as being located in a decommissioned coal mine in the Svalbard archipelago, closer to the North Pole than the Arctic Circle. GitHub reportedly captured a snapshot of every active public repository on 02/02/2020 and preserved that data in the Arctic Code Vault. More details about the vault can be found on GitHub.

So what happens if copyrighted material that should not have been in public repository is swept up into the vault? What happens if personal and sensitive material that never should have been in a public repository is swept up into the vault? What happened to some of Med-Data’s code that seems to have been swept into the vault (as indicated by the star showing that their developer and the repositories became a vault contributor):

Contributor to Vault

When Ursem pointed out this vault issue Med-Data, they reached out to GitHub about getting logs for the vault and to discuss removal of code from the vault (depending on what the logs might show). We do not know what transpired after that, although there had been some muttering that Med-Data might sue GitHub to get the logs.

Did GitHub provide the logs? If so, what did they show? Is anyone’s PHI in GitHub’s Arctic Code Vault? And if so, what happens? Will GitHub remove it? Or will they claim they are immune from suit in the U.S. under Section 230 (if it still exists by then)? Or will code just be left there for researchers to explore in 1,000 years so they can wade through the personal and protected health information or other sensitive information of people who trusted others to protect their privacy?

Question to GitHub

In November, 2020, Ursem posed the question to GitHub on Twitter. They never replied.

We hope that GitHub cooperated with Med-Data, but we raise the issue here because we will bet you that many developers and firms have never even considered what might happen that could go so very wrong. This might be a good time to review our recommendations in “No Need to Hack When It’s Leaking,”

This article was written in collaboration with Jelle Ursem. You can contact him @SchizoDuckie on Twitter or via email to jelle[at]esctunes.com.

Update 8:01 pm: Post-publication, we found that King’s Daughters and SCL Health had also posted notices on the Med-Data breach. We know that there are other entities that should be disclosing, so this will be updated when we find their notices.

Update April 6: University Health in Texas and Paris Regional Medical Center in Texas have posted notices. There are more to come….

Update April 7: PeaceHealth Sacred Heart Medical Center
at RiverBend and AdventHealth Shawnee Mission have posted links to Med-Data’s notice on their own sites.

Update April 10: Hospital Sisters Health System impacting:

HSHS St. Joseph’s Hospital, Chippewa Falls,
St. Joseph’s Home Health & Hospice,
HSHS Sacred Heart Hospital, Eau Claire,
HSHS St. Anthony’s Memorial Hospital, Effingham,
HSHS St. Elizabeth’s Hospital, O’Fallon,
HSHS St. John’s Hospital, Springfield,
HSHS St. Mary’s Hospital, Decatur

Update April 10: Watertown Regional Medical Center

Update April 13: Fort Healthcare for Health (Fort Memorial Hospital). [Reminder: these are the dates we find the notification, and not necessarily the date they were first posted.]

Update April 16: This incident was reported to HHS by Med-Data on April 1 as impacting 135,908 patients.

Update June 29: While investigating another incident, we noticed that University Medical Center of Southern Nevada had posted an incident notice linking to Med-Data’s announcement.