Over on DataViper.io, Vinny Troia reports that he and Bob Diachenko found a massive data leak that appears to implicate two data enrichment firms: People Data Labs (PDL), and OxyData.io. But “implicate” is not the same thing as being able to actually attribute ownership of the elasticsearch server that was open at 35.199.58.125, and both companies denied ownership of the server.
In terms of the quantity and types of data exposed, Troia reports that after deduplication, the PDL user records revealed
roughly 1.2 billion unique people, and 650 million unique email addresses, which is in-line with the statistics provided on their website. The data within the three different PDL indexes also varied slightly, some focusing on scraped LinkedIN information, email addresses and phone numbers, while other indexes provided information on individual social media profiles such as a person’s Facebook, Twitter, and Github URLs.
The OxyData.io data, in contrast, “revealed an almost complete scrape of LinkedIN data, including recruiter information.”
But with neither PDL nor OxyData claiming ownerships (denials that Troia and Diachenko seem to believe based on their research), the public is left with a few questions, at least:
- If the exposed data is not 100% match with PDL and OxyData (and it does not appear to be), then who processed the PDL data to produce the data found on this server?
- The data includes data of Canadian, UK, and U.S. citizens. Does the database violate GDPR? Did whoever is responsible for the data get consent of any EU persons?
As Troia notes, it is often difficult — and in this case impossible — for them to determine ownership. Google cloud services would know, but they will never tell except under legal process. Google is willing to call a customer and alert them to a security concern, but they will not disclose the name of the customer to whoever reports the leak, and they cannot make the customer do anything.
But maybe a complaint to the ICO would result in more transparency?