LG AIApr 21, 2025

Impact of Latent Space Dimension on IoT Botnet Detection Performance: VAE-Encoder Versus ViT-Encoder

Hassan Wasswa, Aziida Nanyonga, Timothy Lynar

arXiv:2504.14879v18 citationsh-index: 82024 3rd International Conference for Innovation in Technology (INOCON)

Originality Synthesis-oriented

AI Analysis

This addresses IoT security concerns by optimizing botnet detection through dimension reduction, but is incremental as it compares existing encoders on standard datasets.

This study investigated how latent space dimension affects IoT botnet detection performance by comparing VAE-encoder and ViT-encoder based dimension reduction on N-BaIoT and CICIoT2022 datasets, finding that VAE-encoder outperformed ViT-encoder across all performance metrics (accuracy, precision, recall, F1-score).

The rapid evolution of Internet of Things (IoT) technology has led to a significant increase in the number of IoT devices, applications, and services. This surge in IoT devices, along with their widespread presence, has made them a prime target for various cyber-attacks, particularly through IoT botnets. As a result, security has become a major concern within the IoT ecosystem. This study focuses on investigating how the latent dimension impacts the performance of different deep learning classifiers when trained on latent vector representations of the train dataset. The primary objective is to compare the outcomes of these models when encoder components from two cutting-edge architectures: the Vision Transformer (ViT) and the Variational Auto-Encoder (VAE) are utilized to project the high dimensional train dataset to the learned low dimensional latent space. The encoder components are employed to project high-dimensional structured .csv IoT botnet traffic datasets to various latent sizes. Evaluated on N-BaIoT and CICIoT2022 datasets, findings reveal that VAE-encoder based dimension reduction outperforms ViT-encoder based dimension reduction for both datasets in terms of four performance metrics including accuracy, precision, recall, and F1-score for all models which can be attributed to absence of spatial patterns in the datasets the ViT model attempts to learn and extract from image instances.

View on arXiv PDF

Similar