CRAILGJun 30, 2025

A Novel Active Learning Approach to Label One Million Unknown Malware Variants

arXiv:2507.02959v11 citationsh-index: 10Int J Approx Reason
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficiently labeling large-scale malware datasets for cybersecurity applications, representing an incremental improvement in active learning techniques for this domain.

The paper tackles the problem of labeling one million unknown malware variants by proposing two novel active learning approaches, with the Vision Transformer-based Bayesian Neural Networks (ViT-BNN) model showing improved stability and robustness in handling uncertainty compared to other methods.

Active learning for classification seeks to reduce the cost of labeling samples by finding unlabeled examples about which the current model is least certain and sending them to an annotator/expert to label. Bayesian theory can provide a probabilistic view of deep neural network models by asserting a prior distribution over model parameters and estimating the uncertainties by posterior distribution over these parameters. This paper proposes two novel active learning approaches to label one million malware examples belonging to different unknown modern malware families. The first model is Inception-V4+PCA combined with several support vector machine (SVM) algorithms (UTSVM, PSVM, SVM-GSU, TBSVM). The second model is Vision Transformer based Bayesian Neural Networks ViT-BNN. Our proposed ViT-BNN is a state-of-the-art active learning approach that differs from current methods and can apply to any particular task. The experiments demonstrate that the ViT-BNN is more stable and robust in handling uncertainty.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes