I was excited back in 2010 when HHS started posting breaches on what some would call the “wall of shame.” I knew that we’d only learn about breaches involving HIPAA-covered entities, but at least we were finally starting to get some actual data. Now, more than 6 years later, it’s become clear to me that it’s probably best to just call time of death on the breach tool, despite its popularity with marketers who look for numbers to support their sales pitches.
In this post, I review some of what we are not seeing on HHS’s breach tool, and why it’s really not a source of accurate or helpful information for those who want to understand breaches and incidents involving health or medical data.
Have You Checked the Dark Web Recently?
Last June, when TheDarkOverlord made headlines by advertising patient databases for sale at absurd prices, it was a bit of a wake-up call for me. I had never checked any dark web marketplaces for patient data that might be up for sale. Nowadays, I do, and I occasionally find evidence of breaches that have never appeared on HHS’s breach tool.
But since I’ve already mentioned TheDarkOverlord, let’s start with him/them. Can you explain why none of three databases recently dumped had appeared on HHS’s breach tool, even though we knew about two of those claimed hacks last year? If you read this site or Protenus’s Breach Barometer, you knew about two of those three incidents last year when they were first disclosed by the hacker(s). But if you relied solely on HHS’s breach tool for your stats on hacking of patient data, you didn’t know about these incidents – and still don’t.
In contrast, an incident involving Behavioral Health Center in Maine is on HHS’s breach tool, but probably only because I discovered it on the dark web and notified the covered entity, who, in turn, notified HHS.
Here are some more dark web listings that still haven’t shown up – and may never show up – on the breach tool:
A listing for pediatric patients’ information, which I’ve previously noted on this site. Although there have been one or two pediatrics offices reporting incidents, none have reported any that would quite correspond to what the vendor is claiming to possess. And then there’s this listing:
Where did these data come from? Were these data from the Nevada incident reported by Justin Shafer – the one where the state said it had no indications of misuse of data and that private patient information was secure? The Nevada incident does not appear on HHS’s breach tool. Unfortunately, the dark web vendor would not provide me with a sample of the data, so I couldn’t confirm whether the data for sale were from Nevada, but there is no incident on HHS’s breach tool that appears to correspond to this listing.
Then there was a dark web listing for 5,200 patients’ records from an incident in Minnesota that almost certainly should be on HHS’s breach tool – except that it’s not:
The listing first appeared in a vetted forum with an asking price of $100.00. It was subsequently listed in a second forum with a $99.00 list price. The second listing identified the data as coming from a particular clinic in Minnesota: LifeMedical. DataBreaches.net was able to obtain all the data, but when I contacted LifeMedical in Minnesota and spoke to their outsourced tech firm, PriorityOne Technologies, they denied that the patient data was theirs. DataBreaches.net also contacted eClinicalWorks because there were references to them in the database. They were of no real help, however. Nor did the dark web vendor respond to a private message I sent them on the marketplace seeking further information.
So we have data that can be partially verified by public sources such as Google searches or by calling the patients directly, but whose patient data were these? If no one has accepted responsibility/ownership of this database, no one has notified HHS and probably no one has notified or warned the 5,200 patients that their data was not only acquired by criminals, but was up for sale on the dark web in at least two marketplaces.
And Oh, Those Third-Party Breaches
Ok, what might happen if a service that handles documents with sensitive information relating to workers compensation cases has a misconfigured server (and no, I’m not talking about the Systema Software leak but a much more recent incident)? DataBreaches.net was so concerned by exposed reports that @s7nsins was finding and sharing with me that I called and sent messages to a law firm whose clients’ personal and medical information were among the numerous files that were exposed.
To give you a sense of the scope of the problem, here’s just one file exposed on the server that has been redacted by DataBreaches.net. Note all the types of information that were included.
Other files included financial information about settlements, W-9 information, radiological findings, and more. Now maybe some of these records could rightfully be considered public records if they’re evidence in litigation, but even courts generally require some redaction or sealing of sensitive information, don’t they?
So how many hundreds – or thousands – or millions – of individuals may have had their personal and medical information exposed accidentally by this vendor and you wouldn’t have even known about it except for the fact a researcher contacted me and I just mentioned something here? How many curious researchers – or worse, criminals – may have downloaded all the data? Yes, DataBreaches.net will be following up on this one, but cannot tell you whether you will ever see it on HHS’s breach tool.
No, I’m Not Done. Not By a Long Shot.
When you think about massive exposed databases or servers like the one mentioned above, it should serve as a sobering reminder that we are only hearing about a fraction of compromised records if our only source of data is HHS’s breach tool.
So what about all the misconfigured MongoDB installations that exposed patient information or health data? And what about the misconfigured rsync backup installations that exposed patient data?
How many of the misconfigured MongoDB incidents or misconfigured rsync incidents have you seen reported on HHS’s public breach tool? Only a handful at the most, probably, because the data are often owned by entities that are not covered by or subject to HIPAA. In other cases, you may not hear about it because researchers contact me and ask me to just handle a notification but not report what they found – such as the time a researcher contacted me about prisoner medical records that were exposed due to a misconfigured server. The prison was very grateful for my call, but I never saw any report from them to HHS. Had anyone else accessed the server? I’ll likely never know – and neither will you.
And while we’re at it: how about all the Sharepoint breaches that entities confessed to in a survey? Where are those reports on HHS’s breach tool?
And what about all the protected health information (PHI) exposed on public FTP servers that Justin Shafer found and reported?
Researchers continue to report tremendous amounts of data exposed by misconfigured databases, servers, and backup devices. But you probably can’t tell that from HHS’s public breach tool.
Do you remember how the FBI issued a private industry notification in March about all those public FTP incidents it claimed it was aware of? Where are those incidents on HHS’s public breach tool? Even when Shafer reported the incidents to HHS, HHS did nothing to investigate most of them and never added them to the breach tool.
Misconfigured MongoDB databases, misconfigured rsync backups, public FTP servers exposing data …. tons and tons of leaking medical or health information that is never reported on HHS’s breach tool. And for the most part, this is not HHS’s fault. But can you really have any confidence in conclusions based on HHS’s public breach tool? I don’t think so.
In addition to the breach tool necessarily omitting incidents involving non-HIPAA-covered entities, DataBreaches.net has already investigated and documented a second problem or limiting factor with HHS’s breach tool: it significantly underestimates and under-reports third-party incidents. A third major problem with the breach tool is that some codes/categories are so ambiguous as to be non-helpful in trying to understand the threat landscape. Is a case of “unauthorized access/disclosure” due to human error on the part of an employee or willful and malicious sharing of information by an employee? Is a “hacking/IT incident” on “network server” really a hack or is it a case that an employee forgot to restore a firewall after an upgrade and a search engine indexed data?
When all the problems are taken together, it’s time to call time of death on using HHS’s breach tool as an analytic tool. It’s time for vendors to stop just rehashing numbers from the site to provide “headlines” about breaches to support their marketing when there is just so much missing or ambiguous in the breach tool data.
We need more reliable and more complete data.
Every month, DataBreaches.net provides data to Protenus, Inc. about breach incidents that were disclosed or first made public during the month. While the data includes incidents reported on HHS’s breach tool, the data goes well beyond the breach tool to provide additional details and incidents.
If you are not subscribing to Protenus’s Breach Barometer, you might want to try it. The Breach Barometer and Verizon’s DBIR are probably the two most useful tools for understanding breaches involving health or medical data, although they employ slightly different methodologies. And of course, Verizon has tons of resources, and I’m just a solo blogger/researcher. But if you’re still just rehashing numbers from HHS’s breach tool, you’re not adding to the conversation and are part of what may just be a major distraction from discussions of the more serious risks of data loss or compromise.
Interesting stuff.
Is there any proof or any studies done to prove that reporting it makes it safer for the general public?
Many of us out here are not convinced that this kind of reporting protects our children.
At this point we have to believe that ALL of our info is out there and yet the average person won’t even change their password or freeze their children’s credit with the proper agencies.
Don’t forget this little nugget… HHS considers a ransomware attack, even if you had a good backup and never paid a ransom, a reportable breach.
http://www.hipaajournal.com/ocr-ransomware-guidance-issued-3500/
Many targeted CE’s simply don’t report.
http://www.healthcareitnews.com/news/ransomware-rising-where-are-all-breach-reports
That said, even if everyone did report, it still doesn’t mean HHS would actually get off their ass and do something about it, as we learn from your research.
Your government hates you… don’t forget that.
Well, if they didn’t already hate me before this week, they probably do now, as I’ve sued them under FOIA.
I guess I missed that. Do you have pro bono counsel?
I hadn’t blogged about it yet, so you didn’t miss anything. And I filed pro se.
A [wo]man who is
hisher own lawyer has a fool for a client?I can get counsel if it appears I need it.
Just FYI. Courts do generally require redacted records, but it is not uniform and they don’t redact enough. Last names, cities of residence, ages, and all sorts of stuff that has nothing to do with the merits of the litigation. Can ya’ tell I’m a lawyer. I guess what I am seeing is an willingness to listen on the part of the medical providers. Thanks for all that you do to get the word out. Some of us are trying to pay attention.
It’s comments like yours that keep me going, JK.
Then again, basically any comment that doesn’t threaten to sue me can be considered encouragement these days. 🙂
Yep. BTW that was “unwillingness”. I love when my computer thinks for me!
I wouldn’t completely throw this data out. Something is better than nothing. The data reported to HHS is biased by those who are afraid of compliance actions or more likely to try to do the right thing but it still gives a sample of the population of breaches to examine.
It isn’t perfect, but it is useful. It is limited be authority given in regulation
The limitation due to statute isn’t the main problem. I would agree it could still have some use if that was the only problem, but the ambiguous coding and reporting is what really makes the tool pretty useless for any serious purposes.