LGJul 28, 2025

Your Attention Matters: to Improve Model Robustness to Noise and Spurious Correlations

arXiv:2507.20453v33 citationsh-index: 2Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses robustness issues for practitioners using Transformers in noisy or imperfect data contexts, but it is incremental as it focuses on comparing existing attention variants.

The study tackled the problem of evaluating self-attention mechanisms in Vision Transformers for robustness to noise and spurious correlations, finding that Doubly Stochastic attention was the most robust, outperforming others by 0.1% to 5.1% under data corruption scenarios.

Self-attention mechanisms are foundational to Transformer architectures, supporting their impressive success in a wide range of tasks. While there are many self-attention variants, their robustness to noise and spurious correlations has not been well studied. This study evaluates Softmax, Sigmoid, Linear, Doubly Stochastic, and Cosine attention within Vision Transformers under different data corruption scenarios. Through testing across the CIFAR-10, CIFAR-100, and Imagenette datasets, we show that Doubly Stochastic attention is the most robust. It consistently outperformed the next best mechanism by $0.1\%-5.1\%$ when training data, or both training and testing data, were corrupted. Our findings inform self-attention selection in contexts with imperfect data. The code used is available at https://github.com/ctamayor/NeurIPS-Robustness-ViT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes