CVLGJun 24, 2021

Exploring Corruption Robustness: Inductive Biases in Vision Transformers and MLP-Mixers

arXiv:2106.13122v218 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses robustness issues in computer vision for researchers and practitioners, though it is incremental as it builds on existing architectures.

The study investigated the inherent corruption robustness of vision transformers compared to ResNet-50 and MLP-Mixers, finding that vision transformers are more robust and exhibit greater shape bias with fewer parameters.

Recently, vision transformers and MLP-based models have been developed in order to address some of the prevalent weaknesses in convolutional neural networks. Due to the novelty of transformers being used in this domain along with the self-attention mechanism, it remains unclear to what degree these architectures are robust to corruptions. Despite some works proposing that data augmentation remains essential for a model to be robust against corruptions, we propose to explore the impact that the architecture has on corruption robustness. We find that vision transformer architectures are inherently more robust to corruptions than the ResNet-50 and MLP-Mixers. We also find that vision transformers with 5 times fewer parameters than a ResNet-50 have more shape bias. Our code is available to reproduce.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes