CLLGOct 29, 2023

Robustifying Language Models with Test-Time Adaptation

arXiv:2310.19177v13 citationsh-index: 14
Originality Incremental advance
AI Analysis

This addresses robustness issues for users of large-scale language models, offering a practical solution without retraining, though it is incremental as it builds on prior adversarial robustness work.

The paper tackles the problem of language models failing on adversarial examples by proposing test-time adaptation, which dynamically adapts inputs using masked word predictions to reverse attacks, achieving over 65% repair rates on sentence classification datasets.

Large-scale language models achieved state-of-the-art performance over a number of language tasks. However, they fail on adversarial language examples, which are sentences optimized to fool the language models but with similar semantic meanings for humans. While prior work focuses on making the language model robust at training time, retraining for robustness is often unrealistic for large-scale foundation models. Instead, we propose to make the language models robust at test time. By dynamically adapting the input sentence with predictions from masked words, we show that we can reverse many language adversarial attacks. Since our approach does not require any training, it works for novel tasks at test time and can adapt to novel adversarial corruptions. Visualizations and empirical results on two popular sentence classification datasets demonstrate that our method can repair adversarial language attacks over 65% o

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes