Steven Englehardt

CR
4papers
209citations
Novelty49%
AI Score25

4 Papers

CRJul 23, 2021
WebGraph: Capturing Advertising and Tracking Information Flows for Robust Blocking

Sandra Siby, Umar Iqbal, Steven Englehardt et al.

Millions of web users directly depend on ad and tracker blocking tools to protect their privacy. However, existing ad and tracker blockers fall short because of their reliance on trivially susceptible advertising and tracking content. In this paper, we first demonstrate that the state-of-the-art machine learning based ad and tracker blockers, such as AdGraph, are susceptible to adversarial evasions deployed in real-world. Second, we introduce WebGraph, the first graph-based machine learning blocker that detects ads and trackers based on their action rather than their content. By building features around the actions that are fundamental to advertising and tracking - storing an identifier in the browser, or sharing an identifier with another tracker - WebGraph performs nearly as well as prior approaches, but is significantly more robust to adversarial evasions. In particular, we show that WebGraph achieves comparable accuracy to AdGraph, while significantly decreasing the success rate of an adversary from near-perfect under AdGraph to around 8% under WebGraph. Finally, we show that WebGraph remains robust to a more sophisticated adversary that uses evasion techniques beyond those currently deployed on the web.

CRAug 11, 2020
Fingerprinting the Fingerprinters: Learning to Detect Browser Fingerprinting Behaviors

Umar Iqbal, Steven Englehardt, Zubair Shafiq

Browser fingerprinting is an invasive and opaque stateless tracking technique. Browser vendors, academics, and standards bodies have long struggled to provide meaningful protections against browser fingerprinting that are both accurate and do not degrade user experience. We propose FP-Inspector, a machine learning based syntactic-semantic approach to accurately detect browser fingerprinting. We show that FP-Inspector performs well, allowing us to detect 26% more fingerprinting scripts than the state-of-the-art. We show that an API-level fingerprinting countermeasure, built upon FP-Inspector, helps reduce website breakage by a factor of 2. We use FP-Inspector to perform a measurement study of browser fingerprinting on top-100K websites. We find that browser fingerprinting is now present on more than 10% of the top-100K websites and over a quarter of the top-10K websites. We also discover previously unreported uses of JavaScript APIs by fingerprinting scripts suggesting that they are looking to exploit APIs in new and unexpected ways.

CRMar 9, 2020
Actions speak louder than words: Semi-supervised learning for browser fingerprinting detection

Sarah Bird, Vikas Mishra, Steven Englehardt et al.

As online tracking continues to grow, existing anti-tracking and fingerprinting detection techniques that require significant manual input must be augmented. Heuristic approaches to fingerprinting detection are precise but must be carefully curated. Supervised machine learning techniques proposed for detecting tracking require manually generated label-sets. Seeking to overcome these challenges, we present a semi-supervised machine learning approach for detecting fingerprinting scripts. Our approach is based on the core insight that fingerprinting scripts have similar patterns of API access when generating their fingerprints, even though their access patterns may not match exactly. Using this insight, we group scripts by their JavaScript (JS) execution traces and apply a semi-supervised approach to detect new fingerprinting scripts. We detail our methodology and demonstrate its ability to identify the majority of scripts ($\geqslant$94.9%) identified by existing heuristic techniques. We also show that the approach expands beyond detecting known scripts by surfacing candidate scripts that are likely to include fingerprinting. Through an analysis of these candidate scripts we discovered fingerprinting scripts that were missed by heuristics and for which there are no heuristics. In particular, we identified over one hundred device-class fingerprinting scripts present on hundreds of domains. To the best of our knowledge, this is the first time device-class fingerprinting has been measured in the wild. These successes illustrate the power of a sparse vector representation and semi-supervised learning to complement and extend existing tracking detection techniques.

HCNov 3, 2013
Networks of Innovation in 3D Printing

Harris Kyriakou, Steven Englehardt, Jeffrey V. Nickerson

Innovation inside companies is difficult to see. But an emerging online community of inventors who publicly post 3D CAD drawings of their work provide a way to observe - and perhaps amplify - innovation. In this paper we analyze the network structure of Thingiverse, a website oriented toward 3D printing. This form of printing blurs the line between creating information and manufacturing objects: drawings can be sent to devices that build 3D objects out of many materials, including resin, ceramics, and metal. As an exploratory study, we analyzed the structure of Thingiverse links. Our results suggest that analysis of remix network structure may provide ways of tracing innovation processes and detecting the emergence of new ideas, combination of disparate ideas.