CVApr 20

MARCO: Navigating the Unseen Space of Semantic Correspondence

arXiv:2604.1826785.11 citationsh-index: 11Has Code
AI Analysis

This work addresses the poor generalization of large-scale semantic correspondence models to unseen keypoints and categories, a key bottleneck for real-world usability.

MARCO introduces a novel training framework for semantic correspondence that improves generalization to unseen keypoints and categories, achieving state-of-the-art results on SPair-71k, AP-10K, and PF-PASCAL with gains of +8.9 PCK@0.01, +5.1 on SPair-U, and +4.7 on MP-100, while being 3x smaller and 10x faster than diffusion-based methods.

Recent advances in semantic correspondence rely on dual-encoder architectures, combining DINOv2 with diffusion backbones. While accurate, these billion-parameter models generalize poorly beyond training keypoints, revealing a gap between benchmark performance and real-world usability, where queried points rarely match those seen during training. Building upon DINOv2, we introduce MARCO, a unified model for generalizable correspondence driven by a novel training framework that enhances both fine-grained localization and semantic generalization. By coupling a coarse-to-fine objective that refines spatial precision with a self-distillation framework, which expands sparse supervision beyond annotated regions, our approach transforms a handful of keypoints into dense, semantically coherent correspondences. MARCO sets a new state of the art on SPair-71k, AP-10K, and PF-PASCAL, with gains that amplify at fine-grained localization thresholds (+8.9 PCK@0.01), strongest generalization to unseen keypoints (+5.1, SPair-U) and categories (+4.7, MP-100), while remaining 3x smaller and 10x faster than diffusion-based approaches. Code is available at https://github.com/visinf/MARCO .

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes