CVDec 12, 2024

Cross-View Completion Models are Zero-shot Correspondence Estimators

Honggyu An, Jinhyeon Kim, Seonghoon Park, Jaewoo Jung, Jisang Han, Sunghwan Hong, Seungryong Kim

arXiv:2412.09072v120.227 citationsh-index: 10CVPR

Originality Highly original

AI Analysis

This work provides a novel method for correspondence estimation that could benefit computer vision applications like 3D reconstruction and object tracking.

The paper tackles the problem of estimating correspondences between different views by showing that cross-attention maps in cross-view completion models capture correspondence more effectively than other features, achieving strong results in zero-shot matching and learning-based geometric tasks.

In this work, we explore new perspectives on cross-view completion learning by drawing an analogy to self-supervised correspondence learning. Through our analysis, we demonstrate that the cross-attention map within cross-view completion models captures correspondence more effectively than other correlations derived from encoder or decoder features. We verify the effectiveness of the cross-attention map by evaluating on both zero-shot matching and learning-based geometric matching and multi-frame depth estimation. Project page is available at https://cvlab-kaist.github.io/ZeroCo/.

View on arXiv PDF

Similar