NetVAD: Foundation-Model Representation Learning for Identifier-Free Unsupervised Intrusion Detection
For network security practitioners, NetVAD bridges the gap between unsupervised and supervised IDS performance on zero-day attacks, though it is limited by flow-based foundation models for certain attack types.
NetVAD uses a frozen network foundation model with a variational autoencoder to achieve unsupervised intrusion detection, reaching 98% Micro F1 and 96% Macro F1 on ToN-IoT at a low false positive rate, but struggles with single-packet reconnaissance.
Detecting zero-day exploits in production networks requires robust Intrusion Detection Systems (IDS). However, current unsupervised models struggle to match the performance of supervised classifiers, which are trained for specific attacks only. To bridge this gap, we leverage the emerging capabilities of Network Foundation Models. We propose \textit{NetVAD}, a strictly identifier-free Variational Autoencoder that projects representations from a frozen Foundation Model into a task-specific latent space, trained solely on benign traffic. Evaluated on ToN-IoT and IoT-23, NetVAD achieves highly competitive unsupervised performance. On ToN-IoT, it achieves a 98% Micro F1-score and a 96% Macro F1-score at an operational false positive rate. Unlike prior work, we show the model's performance transparently for all attack-classes of the datasets. While the architecture excels at discerning complex botnet behaviour (99.6% F1 on Okiru), our evaluation reveals limitations of flow-based Foundation Models in detecting single-packet reconnaissance events. Finally, a comprehensive ablation study confirms that while large-scale pre-training is essential to prevent performance degrading, specialised decoder architectures are necessary to precisely model the complex benign manifold, ensuring attacks are caught more reliably, due to a higher reconstruction loss.