PhD thesis: Automated Analysis and Enforcement of Consent Compliance

Authors: Karel Kubicek

Abstract: The collection and processing of personal data by websites have due to their ubiquity become subject to privacy regulations. In the European Union (EU), the ePrivacy Directive mandates that any data collection not strictly necessary for service provision must be consented to by users. The General Data Protection Regulation (GDPR) further stipulates that such consent must be freely given, specific, informed, and unambiguous. Personal data is any information identifying individuals, yet our primary focus lies on email addresses and browsing behavior collected by cookies.

We evaluate the extent to which the EU’s privacy regulations protect users from unsolicited emails. First, we define the legal properties of registration and newsletter sign-up processes. The combination of these properties forms a decision tree for predicting potential legal violations. We evaluate these violations on a dataset of 1000 websites that we annotated with these legal properties. Second, we train machine learning models on this annotated dataset, and together with our crawler automating the sign-up process, we scale our analysis to cover 660 202 websites. We report observations from both the manual and automated analyses, identifying websites sending marketing emails without obtaining valid consent during registration or sharing users’ email addresses with third parties.

We also examine how websites request consent for non-essential cookies. By investigating a sample of 29 398 websites featuring specific cookie notices, we identify eight potential consent violations in a surprising 94.7% of these websites. These violations stem from discrepancies between the information declared in the consent notice and the actual usage of cookies on the website or from the usage of tracking cookies prior to or despite the user’s negative consent.

Given the high prevalence of privacy violations, we propose automated methods for enforcing privacy compliance. We developed a browser extension, CookieBlock, which employs machine learning for client-side enforcement of cookie consent. Using an XGBoost model which attains competitive accuracy compared to domain experts, CookieBlock categorizes cookies by their usage purpose and filters them according to user preferences. Finally, we suggest utilizing the automated violation detection methods from both the email and cookie projects for notification campaigns to enhance website operators’ awareness of privacy regulations, or for direct enforcement by data protection authorities.

BibTeX

@phdthesis{kubicek2024phd,
  title={Automated Analysis and Enforcement of Consent Compliance},
  author={Kubicek, Karel},
  year={2024},
  school={ETH Zurich},
  doi={10.3929/ethz-b-000662039},
}

This section explains the effects of ePrivacy Directive and General Data Protection Regulations on the data collection and processing and their requirements of consent. I plan to release this chapter as a standalone text that introduces the legal topic to computer scientists.

This thesis combines my publications on consent, please refer to the pages for the individual publications:

Literature overview

I performed an SoK-like literature overview of the cookie consent notices, please find it in Section 6.

Q&A

Acknowledgement

Please find the acknowledgement section of the thesis, as there were many people who helped me with my PhD.

Updates