CR LGApr 24

Detecting Concept Drift in Evolving Malware Families Using Rule-Based Classifier Representations

arXiv:2604.2262918.4

Predicted impact top 71% in CR · last 90 daysOriginality Synthesis-oriented

AI Analysis

For malware analysts, this provides a method to detect concept drift in evolving malware families, though results are incremental and no single approach dominates.

This work proposes a structural approach to concept drift detection in malware classification using decision tree rulesets, evaluated on the EMBER2024 dataset across six malware families. The most reliable configuration was fixed two-month windowing with feature-level Pearson correlation, which produced positive drift-accuracy correlations for all family pairs.

This work proposes a structural approach to concept drift detection in malware classification using decision tree rulesets. Classifiers are trained across temporal windows on the EMBER2024 dataset, and drift is quantified by comparing extracted rule representations using feature importance, prediction agreement, activation stability, and coverage metrics. These metrics are correlated with both accuracy degradation and data distribution shift as complementary drift indicators. The approach is evaluated across six malware families using fixed-interval and clustering-based windowing in family-vs-benign and family-vs-family settings, and compared against RIPPER and Transcendent baselines. Results show that fixed two-month windowing with feature-level Pearson correlation is the most reliable configuration, being the only one where all family pairs produce positive drift-accuracy correlations. The methods are complementary - no single approach dominates across all pairs.

View on arXiv PDF

Similar