39.4NIMar 26
Lightweight GenAI for Network Traffic Synthesis: Fidelity, Augmentation, and ClassificationGiampaolo Bovenzi, Domenico Ciuonzo, Jonatan Krolikowski et al.
Accurate Network Traffic Classification (NTC) is increasingly constrained by limited labeled data and strict privacy requirements. While Network Traffic Generation (NTG) provides an effective means to mitigate data scarcity, conventional generative methods struggle to model the complex temporal dynamics of modern traffic or/and often incur significant computational cost. In this article, we address the NTG task using lightweight Generative Artificial Intelligence (GenAI) architectures, including transformer-based, state-space, and diffusion models designed for practical deployment. We conduct a systematic evaluation along four axes: (i) (synthetic) traffic fidelity, (ii) synthetic-only training, (iii) data augmentation under low-data regimes, and (iv) computational efficiency. Experiments on two heterogeneous datasets show that lightweight GenAI models preserve both static and temporal traffic characteristics, with transformer and state-space models closely matching real distributions across a complete set of fidelity metrics. Classifiers trained solely on synthetic traffic achieve up to 87% F1-score on real data. In low-data settings, GenAI-driven augmentation improves NTC performance by up to +40%, substantially reducing the gap with full-data training. Overall, transformer-based models provide the best trade-off between fidelity and efficiency, enabling high-quality, privacy-aware traffic synthesis with modest computational overhead.
NIOct 13, 2025
From Prompts to Packets: A View from the Network on ChatGPT, Copilot, and GeminiAntonio Montieri, Alfredo Nascita, Antonio Pescapè
Generative AI (GenAI) chatbots are now pervasive in digital ecosystems, yet their network traffic remains largely underexplored. This study presents an in-depth investigation of traffic generated by three leading chatbots (ChatGPT, Copilot, and Gemini) when accessed via Android mobile apps for both text and image generation. Using a dedicated capture architecture, we collect and label two complementary workloads: a 60-hour generic dataset with unconstrained prompts, and a controlled dataset built from identical prompts across GenAI apps and replicated via conventional messaging apps to enable one-to-one comparisons. This dual design allows us to address practical research questions on the distinctiveness of GenAI traffic, its differences from widely deployed traffic categories, and its novel implications for network usage. To this end, we provide fine-grained traffic characterization at trace, flow, and protocol levels, and model packet-sequence dynamics with Multimodal Markov Chains. Our analyses reveal app- and content-specific traffic patterns, particularly in volume, uplink/downlink profiles, and protocol adoption. We highlight the predominance of TLS, with Gemini extensively leveraging QUIC, ChatGPT exclusively using TLS 1.3, and app- and content-specific Server Name Indication (SNI) values. A payload-based occlusion analysis quantifies SNI's contribution to classification: masking it reduces F1-score by up to 20 percentage points in GenAI app traffic classification. Finally, compared with conventional messaging apps when carrying the same content, GenAI chatbots exhibit unique traffic characteristics, highlighting new stress factors for mobile networks, such as sustained upstream activity, with direct implications for network monitoring and management. We publicly release the datasets to support reproducibility and foster extensions to other use cases.