CRAIAug 30, 2025

Cross-Domain Malware Detection via Probability-Level Fusion of Lightweight Gradient Boosting Models

arXiv:2509.00476v11 citations
Originality Incremental advance
AI Analysis

This addresses the problem of malware detection generalization across diverse data sources for cybersecurity applications, but it is incremental as it builds on existing methods with a fusion approach.

The paper tackles cross-domain malware detection by proposing a lightweight framework that fuses probabilities from LightGBM models trained on three distinct datasets, achieving a macro F1-score of 0.823 on cross-domain validation and outperforming individual models.

The escalating sophistication of malware necessitates robust detection mechanisms that generalize across diverse data sources. Traditional single-dataset models struggle with cross-domain generalization and often incur high computational costs. This paper presents a novel, lightweight framework for malware detection that employs probability-level fusion across three distinct datasets: EMBER (static features), API Call Sequences (behavioral features), and CIC Obfuscated Memory (memory patterns). Our method trains individual LightGBM classifiers on each dataset, selects top predictive features to ensure efficiency, and fuses their prediction probabilities using optimized weights determined via grid search. Extensive experiments demonstrate that our fusion approach achieves a macro F1-score of 0.823 on a cross-domain validation set, significantly outperforming individual models and providing superior generalization. The framework maintains low computational overhead, making it suitable for real-time deployment, and all code and data are provided for full reproducibility.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes