The annual report based on breaches investigated by Verizon and the U.S. Secret Service is out. On first reading of the report and the available media coverage, the big headline seems to be that while the number of records or data lost is down significantly, the number of breaches is significantly up – and more small businesses are being attacked. Some of the key findings:
- 83% of victims were targets of opportunity (no change from previous year’s estimates)
- 92% of attacks were not highly difficult (+7%)
- 76% of all data was compromised from servers (-22%)
- 86% were discovered by a third party (+25%)
- 96% of breaches were avoidable through simple or intermediate controls (<>)
- 89% of victims subject to PCI-DSS had not achieved compliance (+10%)
- 92% stemmed from external agents (+22%)
- 17% implicated insiders (-31%)
- <1% resulted from business partners (-10%)
- 9% involved multiple parties (-18%)
- 50% utilized some form of hacking (+10%)
- 49% incorporated malware (+11%)
- 29% involved physical attacks (+14%)
- 17% resulted from privilege misuse (-31%)
- 11% employed social tactics (-17%)
Although much of the report makes sense to me and is fairly congruent with what I’ve been seeing as I compile reports for this blog, it strikes me at how unfortunate it is that Verizon-USSS do not include some comparative analyses involving all of the other breach reports that are out there and that they may not have investigated. They recorded 761 data breaches investigated globally for 2010. Possibly (if not actually likely), many of those are breaches that the USSS and/or Verizon investigated and that may never have been revealed in the media. At the same time, ITRC, which focuses on U.S. data breaches that may lead to ID theft, reported 662 incidents during 2010 and DataLossDB.org reported 405 U.S. breaches. How much overlap is there between Verizon’s 761 cases and ITRC’s and DataLossDB.org’s reports for U.S. incidents? And would inclusion of cases reported in the media or in these databases change anything in terms of our understanding of patterns of breaches?
All sources seem to agree that the number of records compromised was down in 2010. But what about patterns of breaches? Does the Verizon analysis agree with that generated by DataLossDB or ITRC? While Verizon reports that approximately half of their investigated breaches involved hacking, DataLossDB’s 2010 data indicate that only 63 of 464 incidents (13.6% ) involved hacking , and ITRC’s analyses of their data indicated that hacking was involved in 17.1% of cases. Similarly, while Verizon reports that 49% of their 2010 cases involved malware, analysis of DataLossDB 2010 breaches indicates that only 15 of 464 incidents (3%) reported malware was involved. Those are fairly huge differences.
Verizon’s report is clear about their methodology and inclusion criteria and there is value in keeping the criteria consistent across years. There is also value in analyzing breaches in which there has been some degree of confirmation or validation. But the differences noted plus the fact that they reported/investigated only 141 incidents for 2009 when ITRC reported 498 U.S. incidents and DataLossDB.org reported 538 U.S. incidents and 618 incidents globally raises the possibility in my mind that some of what appears to be a huge increase in number of incidents from 2009 to 2010 in the Verizon report may be an artifact of their 2009 estimates being much lower than what was reported publicly. Indeed, as more entities report cases to law enforcement and/or as more businesses arrange for Verizon to investigate breaches, their yearly numbers may climb somewhat artifactually.
I wish Verizon would attempt to reconcile their database with other available databases. The basic patterns uncovered may not change, but we won’t really know with any confidence until it’s done. Of course, if Verizon and the USSS would like to disclose all of the breaches we don’t already know about, we’d be happy to enter them in our databases to see if they change impressions based on what’s publicly available. But since I don’t think they’ll do that, I would simply encourage them to address this simple question with some statistical analysis:
Is there a qualitative difference or any significant quantitative difference in pattern between the cases we know about and the cases Verizon-USSS may know about that weren’t publicly disclosed?
I tweeted this query to Alex Hutton, who says he’ll address this question next week. I look forward to seeing what he has to say.
“it strikes me at how unfortunate it is that Verizon-USSS do not include some comparative analyses involving all of the other breach reports that are out there and that they may not have investigated… I wish Verizon would attempt to reconcile their database with other available databases.”
To be fair, in 2009 we did do a basic analysis against DLDB in our supplemental and continue to do so on our blog. Last year, we didn’t have the bandwidth as we were busy releasing the VERIS community web app. So its not like we don’t have an interest or track record, it’s a function of bandwidth. It takes some work to do some same-to-same translation, but it can be done. At least from “them-to-us”, we haven’t tried an “us-to-them” translation.
We go a little into patterns and “associations” between elements in the ’09 sup. I hope I’ll get to do so soon. Also, we keep saying it would be fun to combine the community data set, the DBIR data set, and what we know from various sources (like DLDB) into a master data set. But I don’t want to do that without giving OSF money first, goodness knows there are already enough vendors not contributing back to that cause.
I certainly didn’t mean to be unfair, and I apologize if it came across as a criticism. It’s a request – not an expectation – because sometimes, I think we’ve got apples and pears and they’re hard to integrate. One additional factor that complicates reconciliation is that OSF/DLDB enters incidents under the year in which they occurred – not the year in which they were reported publicly and not the year that they get entered in the database. As I’ve pointed out before, if we try to compare DBIR for 2009 to DLDB for 2009 but conduct the analysis at the end of 2009, DLDB is likely missing hundreds of incidents that will eventually get entered for that year. That’s why I think a longer-term comparison may be helpful. Or even just take a 2-year period like 2008-2009 and ignore the year factor – just look at patterns without regard to date and see if even that holds up across respective databases.
And yes, OSF could use financial support. There are a lot of breaches that have yet to be backfilled for prior years as well as other breaches that need time and support to research before they’re entered. There seem to be all too many commercial entities who want to use OSF’s hard work to make money for themselves or to sell their products but do not donate to the maintenance and development of the database. There’s a lot we want to develop and do with that database, but we really could use some support to do it (Full Disclosure: I’m a curator and researcher for OSF/DLDB).
“if we try to compare DBIR for 2009 to DLDB for 2009 but conduct the analysis at the end of 2009, DLDB is likely missing hundreds of incidents that will eventually get entered for that year. That’s why I think a longer-term comparison may be helpful.”
Agreed. Personally, I don’t get too strung out about time-framing. It’s enough for us to capture “big change” when we do at this point in our nascent Newschool-ness. At some point, though, precision would be, well, bitchen.
It seems we are in clear agreement. Now all we need is some money thrown at OSF and for you to run the analyses? 🙂
On a serious note, I would really like to talk with you more to identify some key questions that I hope would get looked at in any such analysis, so before you start anything, please drop me a note at admin[at]databreaches.net or research[at]opensecurityfoundation.org and let’s see if we can hone in on a few critical comparisons. I’ve been blogging for about 5 years now on the apples/pears frustration comparing databases and analyses, and it would be great to finally have some actual data on that point. I know any such analysis will still need a gadzillion qualifiers and will raise as many questions as it answers, but I do think this is a great time to look at some of this.
Check out the 2009 Supplemental Report – Appendix A: Comparison of Verizon IR dataset to DataLossDB here:
http://www.verizonbusiness.com/resources/security/reports/rp_2009-data-breach-investigations-supplemental-report_en_xg.pdf
And yes, Alex feel free to send large amounts of money to help the research the Open Security Foundation does at DataLossDB.org =)
Thanks for posting the link, Jake. I’m looking at that appendix again because frankly, I had forgotten about it when I blogged this morning – possibly because I’m old and I forget, but probably because they couldn’t really address then the question I still have: whether the cases we know about differ from the cases we (the public) do not know about. It’s one thing to say that cases that result in confirmed data compromise may have a different pattern than those that result from improper disposal or lost devices where there may be tons of data that never get misused, but it’s another thing if we suddenly discover that everything we think we’re learning from public disclosure absolutely underestimates where the real threats are and which sectors or subsectors are accounting for most of the breaches, etc. So yeah, I think it’s an important question and we need to continue to try to look at that.
That said, the points in their appendix are very well taken. Have they offered us any bribe or inducement to change our coding system? 🙂
I’ll join in. In our previous reports, we have provided a stat re the % of breaches not reported publicly. We did not do that this go around because the scales tipped dramatically where the USSS contributed many more cases than Verizon. When they provide case data to us, vic names are removed, so we truly have no way of cross-referencing those with publicly-announced breaches.
With respect to the difference highlighted above between Verizon’s stats around hacking and malware vs ITRC and DataLossDB – most of that is due to Verizon being a 3rd party forensic shop and USSS being law enforcement. Neither of us are likely to be called in if the org loses a laptop or backup tapes, which is a very common loss event in those sources.
It is a good request, nevertheless. One of these days, we’ll try to run some analysis to see if any significant differences emerge between non-reported cases and those that are publicly known.
Thanks for the write-up.
Thanks for your thoughts on this, Wade. One of the arguments that goes round and round is the internal vs. external issue, so that’s something I’d be curious about in terms of whether the cases we don’t know about split differently on that. Maybe there’s a tad of conspiracy theory at work here, but yeah, why aren’t we finding out about some breaches? Must have something to hide, right? 🙂