LGCVMLMay 20, 2025

Adversarially Pretrained Transformers may be Universally Robust In-Context Learners

arXiv:2505.14042v12 citationsh-index: 3Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of computational inefficiency in adversarial defenses for machine learning practitioners, though it is incremental as it builds on existing adversarial training and in-context learning methods.

The paper tackles the high computational cost of adversarial training by showing that adversarially pretrained transformers can serve as robust foundation models, eliminating the need for adversarial training in downstream tasks through in-context learning, with theoretical demonstration of robust generalization to unseen tasks without parameter updates.

Adversarial training is one of the most effective adversarial defenses, but it incurs a high computational cost. In this study, we show that transformers adversarially pretrained on diverse tasks can serve as robust foundation models and eliminate the need for adversarial training in downstream tasks. Specifically, we theoretically demonstrate that through in-context learning, a single adversarially pretrained transformer can robustly generalize to multiple unseen tasks without any additional training, i.e., without any parameter updates. This robustness stems from the model's focus on robust features and its resistance to attacks that exploit non-predictive features. Besides these positive findings, we also identify several limitations. Under certain conditions (though unrealistic), no universally robust single-layer transformers exist. Moreover, robust transformers exhibit an accuracy--robustness trade-off and require a large number of in-context demonstrations. The code is available at https://github.com/s-kumano/universally-robust-in-context-learner.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes