CVDec 12, 2024

Cross-View Completion Models are Zero-shot Correspondence Estimators

arXiv:2412.09072v126 citationsh-index: 10CVPR
Originality Highly original
AI Analysis

This work provides a novel method for correspondence estimation that could benefit computer vision applications like 3D reconstruction and object tracking.

The paper tackles the problem of estimating correspondences between different views by showing that cross-attention maps in cross-view completion models capture correspondence more effectively than other features, achieving strong results in zero-shot matching and learning-based geometric tasks.

In this work, we explore new perspectives on cross-view completion learning by drawing an analogy to self-supervised correspondence learning. Through our analysis, we demonstrate that the cross-attention map within cross-view completion models captures correspondence more effectively than other correlations derived from encoder or decoder features. We verify the effectiveness of the cross-attention map by evaluating on both zero-shot matching and learning-based geometric matching and multi-frame depth estimation. Project page is available at https://cvlab-kaist.github.io/ZeroCo/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes