CVLGMay 23, 2023

Siamese Masked Autoencoders

arXiv:2305.14344v198 citations
Originality Incremental advance
AI Analysis

This addresses the problem of establishing image correspondence for computer vision applications, offering a simple yet effective method that is incremental but improves performance without relying on complex pretext tasks.

The paper tackles the challenge of learning visual correspondence from videos by introducing Siamese Masked Autoencoders (SiamMAE), which asymmetrically masks frames to predict missing patches, resulting in features that outperform state-of-the-art self-supervised methods on tasks like video object segmentation and pose keypoint propagation.

Establishing correspondence between images or scenes is a significant challenge in computer vision, especially given occlusions, viewpoint changes, and varying object appearances. In this paper, we present Siamese Masked Autoencoders (SiamMAE), a simple extension of Masked Autoencoders (MAE) for learning visual correspondence from videos. SiamMAE operates on pairs of randomly sampled video frames and asymmetrically masks them. These frames are processed independently by an encoder network, and a decoder composed of a sequence of cross-attention layers is tasked with predicting the missing patches in the future frame. By masking a large fraction ($95\%$) of patches in the future frame while leaving the past frame unchanged, SiamMAE encourages the network to focus on object motion and learn object-centric representations. Despite its conceptual simplicity, features learned via SiamMAE outperform state-of-the-art self-supervised methods on video object segmentation, pose keypoint propagation, and semantic part propagation tasks. SiamMAE achieves competitive results without relying on data augmentation, handcrafted tracking-based pretext tasks, or other techniques to prevent representational collapse.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes