CR CLDec 20, 2024

Adversarial Robustness through Dynamic Ensemble Learning

Hetvi Waghela, Jaydip Sen, Sneha Rakshit

arXiv:2412.16254v12.34 citationsh-index: 62024 IEEE Silchar Subsection Conference (SILCON 2024)

Originality Incremental advance

AI Analysis

It addresses adversarial robustness for pre-trained language models in NLP, offering a practical and scalable solution, though it is incremental as it builds on existing ensemble and adversarial training methods.

This paper tackles the problem of adversarial attacks on pre-trained language models by introducing ARDEL, a dynamic ensemble learning scheme that significantly improves robustness, reducing attack success rates and maintaining higher accuracy under adversarial conditions.

Adversarial attacks pose a significant threat to the reliability of pre-trained language models (PLMs) such as GPT, BERT, RoBERTa, and T5. This paper presents Adversarial Robustness through Dynamic Ensemble Learning (ARDEL), a novel scheme designed to enhance the robustness of PLMs against such attacks. ARDEL leverages the diversity of multiple PLMs and dynamically adjusts the ensemble configuration based on input characteristics and detected adversarial patterns. Key components of ARDEL include a meta-model for dynamic weighting, an adversarial pattern detection module, and adversarial training with regularization techniques. Comprehensive evaluations using standardized datasets and various adversarial attack scenarios demonstrate that ARDEL significantly improves robustness compared to existing methods. By dynamically reconfiguring the ensemble to prioritize the most robust models for each input, ARDEL effectively reduces attack success rates and maintains higher accuracy under adversarial conditions. This work contributes to the broader goal of developing more secure and trustworthy AI systems for real-world NLP applications, offering a practical and scalable solution to enhance adversarial resilience in PLMs.

View on arXiv PDF

Similar