More Than Meets the Eye: A Semantics-Aware Traffic Augmentation Framework for Generalizable Website Fingerprinting

Youquan Xian, Xueying Zeng, Lingjia Meng, Lei Cui, Runhan Song, Wei Wang, Zhengquan Ding, Peng Liu, Zhiyu Hao

arXiv:2605.114024.3

Predicted impact top 42% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For website fingerprinting researchers, this work addresses a critical generalization bottleneck in real-world deployment, though the method is domain-specific.

The paper tackles the poor generalization of deep learning-based website fingerprinting under geographic and temporal shifts, caused by variability in application-layer resources and cross-layer encapsulation. SATA, a semantics-aware traffic augmentation framework, improves open-world accuracy by 90.81% and AUROC by 48.37%.

Deep learning-based website fingerprinting has emerged as an effective technique for inferring the websites users visit. Although existing methods achieve strong performance on closed-world datasets, they often fail to generalize to real-world environments, especially under geographic and temporal shifts. This limitation fundamentally stems from the coupled effects of two key challenges: application-layer resource composition variability and observable feature instability induced by cross-layer encapsulation. Intertwined, these factors induce systematic shifts between underlying application semantics and observable traffic features. To address the above challenges, we propose SATA , a semantics-aware traffic augmentation framework. Specifically, SATA first performs application-layer semantic augmentation based on protocol rules, expanding the resource composition patterns within each flow and frame sequence patterns under protocol constraints. Based on these augmented frame sequences, we further introduce a cross-layer feature alignment mechanism via knowledge distillation. It aligns frame sequence with packet-length sequence features, enabling cross-layer feature alignment between enhanced semantics and observable sequences. Extensive experiments show that SATA successfully generates traffic patterns that are absent from the training set but genuinely exist in the test set, and significantly improves the performance of mainstream models across diverse and complex scenarios. In particular, in open-world settings, SATA improves ACC by 90.81% and AUROC by 48.37%. The source code of the prototype system is available at https://anonymous.4open.science/r/SATA-B6C2/.

View on arXiv PDF

Similar