CVJun 8, 2023

R-MAE: Regions Meet Masked Autoencoders

Meta AI
arXiv:2306.05411v218 citationsh-index: 67Has Code
Originality Incremental advance
AI Analysis

This work addresses image representation learning for computer vision researchers, offering incremental improvements over existing masked autoencoding methods.

The paper tackles the problem of self-supervised image representation learning by proposing masked region autoencoding, which treats regions as visual analogues of words, and shows consistent improvements in detection and segmentation benchmarks with negligible computational overhead.

In this work, we explore regions as a potential visual analogue of words for self-supervised image representation learning. Inspired by Masked Autoencoding (MAE), a generative pre-training baseline, we propose masked region autoencoding to learn from groups of pixels or regions. Specifically, we design an architecture which efficiently addresses the one-to-many mapping between images and regions, while being highly effective especially with high-quality regions. When integrated with MAE, our approach (R-MAE) demonstrates consistent improvements across various pre-training datasets and downstream detection and segmentation benchmarks, with negligible computational overheads. Beyond the quantitative evaluation, our analysis indicates the models pre-trained with masked region autoencoding unlock the potential for interactive segmentation. The code is provided at https://github.com/facebookresearch/r-mae.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes