A Theoretical Framework for OOD Robustness in Transformers using Gevrey Classes
This work addresses robustness issues for Transformer models in NLP applications, but it is incremental as it builds on existing theoretical concepts.
The authors tackled the problem of Transformer language model robustness under semantic out-of-distribution shifts by developing a theoretical framework using Wasserstein-1 distance and Gevrey-class smoothness, deriving sub-exponential upper bounds on prediction error and validating them with experiments on arithmetic and Chain-of-Thought tasks.
We study the robustness of Transformer language models under semantic out-of-distribution (OOD) shifts, where training and test data lie in disjoint latent spaces. Using Wasserstein-1 distance and Gevrey-class smoothness, we derive sub-exponential upper bounds on prediction error. Our theoretical framework explains how smoothness governs generalization under distributional drift. We validate these findings through controlled experiments on arithmetic and Chain-of-Thought tasks with latent permutations and scalings. Results show empirical degradation aligns with our bounds, highlighting the geometric and functional principles underlying OOD generalization in Transformers.