37.3CRJun 4
SecRL-Prune: Structured Reinforcement Learning-Based Pruning of CodeLLMs for Preserving Adversarial Code MutationParsa Memarzadehsaghezi, Pooria Madani, Khalil El-Khatib
Large code language models (CodeLLMs) can generate and rewrite programs, enabling functionality-preserving code mutation that may be used to create diverse malware variants and evade signature-based detection. A key security question is whether this mutation capability survives model compression, which would make deployment feasible under limited hardware budgets. We propose SecRL-Prune, a structured pruning framework for CodeLLMs that operates on feed-forward (MLP/FFN) channels. Starting from a pretrained teacher, it learns a layer-wise pruning policy with reinforcement learning using a teacher-student KL-divergence reward. To improve efficiency, we cache the teacher's top-P predictions once and compare the pruned student against this compact target, avoiding simultaneous teacher-student residency in GPU memory. We evaluate SecRL-Prune on HumanEval using pass@k for execution correctness and var@k for code diversity across three 7B CodeLLMs at 10-30% compression. SecRL-Prune consistently preserves higher pass@k and var@k than recent structured pruning baselines under aggressive pruning. In a case study on real malware samples, semantics-preserving mutations from 20%-pruned models substantially reduced detections. These results show that code mutation capability can survive significant structured pruning, highlighting the security relevance of compressed CodeLLMs.
11.3CRJun 4
Robust Ensemble of Selectively Strengthened and Augmented PredictorsParsa Memarzadehsaghezi, Zahra Hashemi, Pooria Madani et al.
Evasion attacks present a significant challenge to the robustness of machine learning (ML)-based classifiers, particularly in critical applications such as fraud detection and cybersecurity. Although existing defense mechanisms are effective in some settings, they often suffer from limited generalizability and do not systematically improve model robustness across diverse attack scenarios. To address these limitations, we introduce Robust Ensemble of Selectively Strengthened and Augmented Predictors (RESSAP), a novel framework that transforms a single classifier into an ensemble of robust classifiers. Each classifier in the ensemble is trained on a carefully selected subset of features, where feature selection is guided by a resilience metric that accounts for both feature importance and robustness. During inference, a random subset of these classifiers is used to make predictions, increasing unpredictability and improving resistance to adversarial manipulation. In addition, noise-based data augmentation is applied during training to strengthen decision boundaries and improve generalization. Our experimental results demonstrate that RESSAP significantly improves robustness against adversarial evasion attacks while maintaining strong accuracy on clean data. Overall, this model-agnostic framework provides a scalable and flexible defense strategy for enhancing the security of machine learning systems without requiring major changes to existing architectures.