Reality Check for Tor Website Fingerprinting in the Open World
This work is significant for Tor users and developers, as it demonstrates that website fingerprinting attacks are still highly effective in a realistic open-world setting, despite previous debates.
This paper re-examines website fingerprinting (WF) attacks on Tor from a guard-relay perspective using a novel, privacy-preserving methodology that combines real unlabeled Tor traffic with synthetic monitored traces. They found that WF remains highly effective against real Tor open-world traffic, with the best attack achieving 0.956 precision and 0.922 recall at a 9% base rate.
Website fingerprinting (WF) attacks on Tor can infer user destinations from encrypted traffic metadata. However, their real-world effectiveness remains debated due to laboratory settings that fail to capture network fluctuations, evaluate noise, and create a representative open world. In this work, we re-examine WF from a guard-relay vantage point using a novel, privacy-preserving methodology that builds an open-world background from real, unlabeled Tor traffic paired with synthetic monitored traces. Using this methodology, we collect a large-scale dataset of over 800,000 traces. We then benchmark state-of-the-art WF attacks under a cross-network setting and show that WF remains highly effective against real Tor open-world traffic: the best-performing attack achieves 0.956 precision and 0.922 recall at a 9% base rate. We further present results that demonstrate robustness to small training sets, network jitter, and concept drift. Moreover, we show that timing-independent classifiers are significantly more robust to network variability than others. Finally, we provide the first systematic study of Tor's Conflux traffic-splitting, where we show that a guard node with a latency advantage can maintain high attack effectiveness even when traffic is split.