CVMar 31

Suppressing Non-Semantic Noise in Masked Image Modeling Representations

Martine Hjelkrem-Tan, Marius Aasan, Rwiddhi Chakraborty, Gabriel Y. Arteaga, Changkyu Choi, Adín Ramírez Rivera

arXiv:2604.0017228.7

AI Analysis

This work addresses a specific issue in self-supervised vision learning for researchers and practitioners, offering a post-hoc, training-free method to enhance model performance, though it is incremental as it builds on existing MIM paradigms.

The paper tackled the problem of non-semantic noise in Masked Image Modeling (MIM) representations, which hurts inference performance, and introduced Semantically Orthogonal Artifact Projection (SOAP) to suppress this noise, leading to consistent improvements in zero-shot performance across various MIM-based models.

Masked Image Modeling (MIM) has become a ubiquitous self-supervised vision paradigm. In this work, we show that MIM objectives cause the learned representations to retain non-semantic information, which ultimately hurts performance during inference. We introduce a model-agnostic score for semantic invariance using Principal Component Analysis (PCA) on real and synthetic non-semantic images. Based on this score, we propose a simple method, Semantically Orthogonal Artifact Projection (SOAP), to directly suppress non-semantic information in patch representations, leading to consistent improvements in zero-shot performance across various MIM-based models. SOAP is a post-hoc suppression method, requires zero training, and can be attached to any model as a single linear head.

View on arXiv PDF

Similar