LGCVOct 15, 2022

How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders

MIT
arXiv:2210.08344v2109 citationsh-index: 28Has Code
Originality Highly original
AI Analysis

It provides foundational insights for self-supervised learning, addressing theoretical gaps in a widely used method.

The paper tackles the lack of theoretical understanding of Masked Autoencoders (MAE) by establishing a connection to contrastive learning and analyzing mask effects, leading to downstream guarantees and a proposed U-MAE loss that improves performance on datasets like CIFAR-10 and ImageNet.

Masked Autoencoders (MAE) based on a reconstruction task have risen to be a promising paradigm for self-supervised learning (SSL) and achieve state-of-the-art performance across different benchmark datasets. However, despite its impressive empirical success, there is still limited theoretical understanding of it. In this paper, we propose a theoretical understanding of how masking matters for MAE to learn meaningful features. We establish a close connection between MAE and contrastive learning, which shows that MAE implicit aligns the mask-induced positive pairs. Built upon this connection, we develop the first downstream guarantees for MAE methods, and analyze the effect of mask ratio. Besides, as a result of the implicit alignment, we also point out the dimensional collapse issue of MAE, and propose a Uniformity-enhanced MAE (U-MAE) loss that can effectively address this issue and bring significant improvements on real-world datasets, including CIFAR-10, ImageNet-100, and ImageNet-1K. Code is available at (https://github.com/zhangq327/U-MAE).

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes