LGCRJul 11, 2025

ADAPT: A Pseudo-labeling Approach to Combat Concept Drift in Malware Detection

arXiv:2507.08597v24 citationsh-index: 15RAID
Originality Incremental advance
AI Analysis

This work addresses the problem of costly model updates for malware detection practitioners, though it is incremental as it builds on semi-supervised learning in a relatively underexplored domain.

The paper tackles performance degradation in malware detection models due to concept drift by introducing ADAPT, a pseudo-labeling semi-supervised algorithm, which consistently outperforms baseline models across five diverse datasets.

Machine learning models are commonly used for malware classification; however, they suffer from performance degradation over time due to concept drift. Adapting these models to changing data distributions requires frequent updates, which rely on costly ground truth annotations. While active learning can reduce the annotation burden, leveraging unlabeled data through semi-supervised learning remains a relatively underexplored approach in the context of malware detection. In this research, we introduce \texttt{ADAPT}, a novel pseudo-labeling semi-supervised algorithm for addressing concept drift. Our model-agnostic method can be applied to various machine learning models, including neural networks and tree-based algorithms. We conduct extensive experiments on five diverse malware detection datasets spanning Android, Windows, and PDF domains. The results demonstrate that our method consistently outperforms baseline models and competitive benchmarks. This work paves the way for more effective adaptation of machine learning models to concept drift in malware detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes