SST-Guard: Detecting and Characterizing Server-Side Google Analytics in the Wild
For web privacy researchers and users, this work addresses the growing problem of server-side tracking that evades existing client-side protections, providing a practical detection method.
The paper investigates server-side Google Analytics (sGA), a tracking method that bypasses client-side blocking by routing requests through intermediate endpoints. The authors present SST-Guard, a system that detects sGA by identifying semantic artifacts of data collection across network requests, cookies, and the window object, achieving 99.8% accuracy with the network request classifier and finding 4.21% of top-150k websites using sGA.
As web browsers increasingly restrict client-side tracking, the web tracking ecosystem is shifting from client-side to server-side tracking (SST). In SST, the browser sends tracking requests to an intermediate endpoint, which then forwards them to the tracker's endpoint, eliminating direct client-to-tracker requests. As a result, existing tracking protections that block requests to known tracker endpoints are rendered ineffective. In this paper, we investigate server-side implementation of Google Analytics, the most widely deployed third-party tracking service on the web today. We also present SST-Guard, a multi-modal, browser-based system for detecting and blocking server-side Google Analytics (sGA). Our key insight is that even when the tracker's endpoints change, sGA must necessarily still collect and share the same semantic information as client-side Google Analytics (e.g., identifiers, event metadata). Therefore, rather than detecting requests to known Google Analytics endpoints, SST-Guard aims to detect underlying artifacts of collection and sharing of these semantic values to any arbitrary endpoint. Operationalizing this insight is challenging because real-world sGA deployments commonly customize endpoints and obfuscate URLs/payloads. SST-Guard addresses this challenge using a value-template approach that employs regular expressions to match semantic value patterns across multiple modalities: network requests, cookies, and the window object. We validate SST-Guard on Tranco top-10k websites, detecting 4.02\% (403) sGA domains with over 93\% accuracy across three modalities, with network request classifier demonstrating the highest accuracy (99.8\%). By deploying SST-Guard in the wild, we find 4.21\% (6,314) of Tranco top-150k websites using sGA.