CL AI CR LGDec 29, 2024

On Adversarial Robustness of Language Models in Transfer Learning

Bohdan Turbal, Anastasiia Mazur, Jiaxu Zhao, Mykola Pechenizkiy

arXiv:2501.00066v24.9h-index: 49

Originality Incremental advance

AI Analysis

This work addresses the problem of maintaining model security in real-world applications for developers and users of LLMs, but it is incremental as it builds on existing adversarial robustness research.

The paper investigates adversarial robustness of large language models in transfer learning, finding that transfer learning often increases vulnerability to adversarial attacks, though larger models show greater resilience.

We investigate the adversarial robustness of LLMs in transfer learning scenarios. Through comprehensive experiments on multiple datasets (MBIB Hate Speech, MBIB Political Bias, MBIB Gender Bias) and various model architectures (BERT, RoBERTa, GPT-2, Gemma, Phi), we reveal that transfer learning, while improving standard performance metrics, often leads to increased vulnerability to adversarial attacks. Our findings demonstrate that larger models exhibit greater resilience to this phenomenon, suggesting a complex interplay between model size, architecture, and adaptation methods. Our work highlights the crucial need for considering adversarial robustness in transfer learning scenarios and provides insights into maintaining model security without compromising performance. These findings have significant implications for the development and deployment of LLMs in real-world applications where both performance and robustness are paramount.

View on arXiv PDF

Similar