Cross-posted from PogoWasRight.org:
In a year when both Congress and the FTC have been making noise about regulating online advertising, you would think that the industry would be eager to show that such regulation is not needed. Yet a new study released last week by researchers at Carnegie Mellon University’s CyLab suggests that not only has the industry generally not availed itself of a protocol designed to promote consumer privacy, but in many cases, web site operators may be intentionally subverting consumers’ attempts to control what cookies are placed on our computers.
In this blog entry, I interview Lorrie Cranor about their findings and its implications for consumers. I hope that Microsoft, TRUSTe, and IAB will respond to inquiries I have sent them via e-mail and will include responses from them in subsequent blog entries if they respond.
Lorrie Cranor has been extensively involved with the development of the Platform for Privacy Preferences (P3P) standard and is one of its authors.
PWR: Many people are familiar with written privacy policies but may not know about machine-readable privacy policies. Are websites supposed to have a machine-readable full privacy policy, and do most have them?
LC: The only sites that are required by law (as far as I know) to have machine-readable privacy policies are US government websites (and when we checked a few years ago we found large numbers not in compliance). For everyone else, it’s entirely optional.
Back in the late 1990s there was a lot of discussion about industry self-regulation in the US, and the idea of machine-readable privacy policies was floated by the industry as a way to further notice and choice without burdening consumers with having to stop at every website they visit to read the privacy policy. The World Wide Web Consortium developed the Platform for Privacy Preferences (P3P) machine-readable privacy standard with input from industry, privacy advocates, and regulators. The hope had been that all websites would adopt P3P and web browsers would make it easy for users to setup their privacy preferences so that they could tell at a glance whether the site they were visiting would address their privacy needs.
So what happened? Well, Internet Explorer was the only major web browser that ever implemented P3P, and P3P adoption by websites never got to the levels we had hoped for. We did a study on P3P adoption in 2006 (see http://lorrie.cranor.org/pubs/p3p-deployment.html ) and found that about 25% of popular websites and 11% of random websites had adopted P3P.
It’s also important to note that the original idea behind P3P was not limited to privacy policies for cookies. P3P policies are supposed to cover the whole privacy policy. However, in order to make it easier for web browsers to use P3P for cookies, the P3P specification includes something called a P3P compact policy (CP). The CP is just a summary of the full P3P policy for cookies. And any site that has a CP is supposed to have a corresponding full P3P policy since the CP doesn’t have all the details found in a full policy.
PWR: Should privacy-oriented users be concerned if a site doesn’t have a full machine-readable privacy policy?
LC: If a site has a CP and doesn’t have a full machine-readable privacy policy, our data suggests that the site might have created their CP to try to get around IE’s cookie filtering, and that seems to me to be a cause for concern.
PWR: How often did you find that sites had CPs but no full policies?
LC: Sites that have CPs must have full policies. But we found only 21% of sites with CPs actually have full P3P policies.
PWR: For sites that did have both a CP and a full machine-readable policy, how often was there a mismatch?
LC: Checking for mismatches between CP and full P3P policy or human-readable policy is fairly labor intensive so we only did a little bit of spot checking of that and don’t have quantitative data. For the sites that we checked where we had already found CP syntax errors, we also typically found mismatches in meaning between those CPs and either the full P3P policy or human-readable policy. So for the 50-most visited sites, there were 11 with CP errors [appendix D of the report]. Four of those had slight discrepancies and the rest did not seem to match at all.
PWR: So what happens if a site has a written privacy policy that doesn’t match their machine-readable one or the machine-readable one doesn’t match the CP?
LC: If a site has multiple policies — human-readable, machine-readable, compact, etc., and they don’t all match, then the user doesn’t really know what to trust. At least one of them must be incorrect. So I would say you can’t rely on any of them.
PWR: When someone opens a URL in their browser, does their browser agent look for both the full machine-readable policy and the CP or just the CP?
LC: By default IE6, IE7, and IE8 only look for CPs (and only if the site sends a cookie). You can go to the View menu and view the privacy report, and then IE will fetch the full P3P policy. No other major web browsers use P3P at all.
PWR: As background, what happens if there’s a mismatch between what the user wants in terms of their privacy preferences and what the site’s CP says it does?
LC: The IE cookie settings take into account the user’s preference setting and whether the cookie is a first-party or third-party cookie. The default setting will block third-party cookies when they are “unsatisfactory” and turn unsatisfactory first-party cookies into session cookies.
PWR: So if a CP contains errors, it might be telling the browser agent that it doesn’t place any cookies when in fact it does? Is that the risk?
LC: Actually it may be telling the browser agent that the cookie is being used for something that doesn’t cause privacy concerns, when in fact that’s not what the cookie is being used for.
PWR: What else can happen if the CP contains coding errors?
LC: When using IE, the biggest issue is that cookies that should be blocked or turned into session cookies will be let through as-is. More generally, since you could have browser plugins or other tools that rely on CPs, the problem is that the site is making an unreliable statement about their privacy practices that could mislead users and whatever tools the users are using.
PWR: Would it be fair to say that most of the errors you and your colleagues found are of the kind that would result in users having less privacy protection rather than more privacy protection?
LC: Yes.
PWR: How big a deal is this really, though, for users who have their browser preferences set to delete cookies after every session or when they close their browser? Are there any persisting risks or other risks?
LC: If a user is using IE and deleting cookies after every session, then they may be tracked during a browser session but not beyond that, even with these faulty CPs. What that means depends in part on how often they close their browser. I tend to leave my browser open until I have to reboot my computer, which may be a couple of weeks. Even if you don’t leave it open that long, if you go to a website that has third-party advertising that tracks you, you may find that the sites you visit next serve you ads related to that first site you visited.
PWR: You mentioned that only IE makes use of P3P. What is happening with other major web browsers?
LC: My students have developed some experimental P3P user agents, including Firefox plugins and a P3P search engine (see http://privacyfinder.org) but none of them make use of CPs, only full policies. Of course, all this evidence of sites not taking CPs seriously makes us very concerned about the state of full policies as well.
PWR: Your research also pointed out that slightly over one third of sites certified by TRUSTe had errors in the CP files and that it appeared that TRUSTe wasn’t checking for the presence of full policies and wasn’t checking the CP against the full policy to ensure its accuracy. Do you think that all vendors who are providing safety certification for sites need to start including that in their assessments?
LC: Just to clarify, that’s one third of the sites certified by TRUSTe that had CPs that had errors. There are many more TRUSTe sites that don’t use CPs in the first place.
But yes, I think anyone providing privacy certification or privacy audits really should check the full P3P policies and CPs to make sure they match the human-readable policies they are certifying.
PWR: The report also suggests that while some of the errors detected may be innocent errors due to transposing letters in the coding, it appeared that many sites may be intentionally using CP files that will override browser preferences. What made you think that this might be intentional?
LC: There were a few things that made us suspicious. First, we started noticing some of the more popular websites had CPs that were completely invalid. Facebook had a CP that just said “HONK” and Amazon had one that said “AMZN.”
Why would you go to the trouble of creating a meaningless CP except to avoid cookie blocking? Then we started noticing that there were some invalid CPs that showed up hundreds or thousands of times in our dataset. So we started Googling for those CPs and found articles that gave web administrators the advice to use these CPs to prevent IE from blocking their cookies. And then, most amazing of all, we found this advice on the Microsoft support website.
PWR: Do you think that the FTC needs to start investigating companies that have errors in their CP files for misrepresentation or deceptive business practices?
LC: Yes. To the extent that self-regulation remains a key component of privacy regulation in the US, it is important that the FTC play a role in enforcement. P3P is part of that self-regulatory effort, and it needs some oversight.
PWR: What’s the “take home” message for consumers? In light of your findings, what else do we need to do to protect our privacy?
LC: For consumers, I think this should raise doubts in their minds that companies actually do what they say they do in their privacy policies. They should be aware that companies are using all sorts of techniques to track them online and that these techniques may be designed to circumvent the protections in their web browser.
I would also like to add that there are currently a number of proposals floating around for new approaches to machine-readable privacy policies and privacy icons that can be shown in web browsers or incorporated into online ads. Given our experience with P3P, I think it really calls into question whether this is a workable approach without enforcement by regulators or better incentives for compliance.
Thanks to Lorrie Cronor for giving so generously of her time, and I hope to have more coverage of the issues raised later this week.