CRLGMar 24, 2025

Leveraging VAE-Derived Latent Spaces for Enhanced Malware Detection with Machine Learning Classifiers

arXiv:2503.20803v22 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

This work addresses malware detection in cybersecurity, but it is incremental as it applies existing methods to a new data representation.

This paper tackled malware detection by evaluating five machine learning classifiers using latent representations from a Variational Autoencoder on malware datasets, finding that ensemble methods like LightGBM and Random Forest performed slightly better with reduced computational costs and hyperparameter tuning needs.

This paper assesses the performance of five machine learning classifiers: Decision Tree, Naive Bayes, LightGBM, Logistic Regression, and Random Forest using latent representations learned by a Variational Autoencoder from malware datasets. Results from the experiments conducted on different training-test splits with different random seeds reveal that all the models perform well in detecting malware with ensemble methods (LightGBM and Random Forest) performing slightly better than the rest. In addition, the use of latent features reduces the computational cost of the model and the need for extensive hyperparameter tuning for improved efficiency of the model for deployment. Statistical tests show that these improvements are significant, and thus, the practical relevance of integrating latent space representation with traditional classifiers for effective malware detection in cybersecurity is established.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes