Gray-Box Poisoning of Continuous Malware Ingestion Pipelines
For security practitioners deploying ML-based malware detection, this work highlights the vulnerability of continuous learning pipelines to poisoning and demonstrates a practical defense.
The paper investigates gray-box poisoning attacks on continuous malware ingestion pipelines using IAT and section injections, showing that subtle IAT-based perturbations degrade LightGBM detection recall, and a homogeneous ensemble defense filters up to 95.6% of poisoning attempts.
Modern malware detection pipelines rely on continuous data ingestion and machine learning to counter the high volume of novel threats. This work investigates a realistic gray-box poisoning threat model targeting these pipelines. Using the secml_malware framework, we generate problem-space adversarial binaries through functionality-preserving manipulations, specifically Import Address Table (IAT) and section injections. We evaluate the impact of these poisoned samples when ingested into a defender's training set for a LightGBM malware detection model. Our empirical results demonstrate that subtle IAT-based perturbations enable compact poisoning samples that significantly degrade detection recall. These findings illustrate the inherent challenge of developing low-visibility adversarial perturbations that maintain high poisoning efficacy within continuous learning systems. We further evaluate a defense mechanism based on a homogeneous ensemble, which successfully identifies and filters up to 95.6% of poisoning attempts while maintaining a high retention rate for legitimate data. These findings emphasize the necessity of robust pre-ingestion validation in production pipelines.