CVJul 1, 2025

Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space

arXiv:2507.00392v21 citationsh-index: 5
AI Analysis

This addresses the limitation of existing methods that rely on scarce multi-view data, potentially improving feature matching for diverse and challenging scenarios in computer vision.

The paper tackles the problem of feature matching in computer vision by proposing a two-stage framework that lifts 2D images to 3D space, achieving superior generalization across zero-shot evaluation benchmarks.

Feature matching plays a fundamental role in many computer vision tasks, yet existing methods heavily rely on scarce and clean multi-view image collections, which constrains their generalization to diverse and challenging scenarios. Moreover, conventional feature encoders are typically trained on single-view 2D images, limiting their capacity to capture 3D-aware correspondences. In this paper, we propose a novel two-stage framework that lifts 2D images to 3D space, named as \textbf{Lift to Match (L2M)}, taking full advantage of large-scale and diverse single-view images. To be specific, in the first stage, we learn a 3D-aware feature encoder using a combination of multi-view image synthesis and 3D feature Gaussian representation, which injects 3D geometry knowledge into the encoder. In the second stage, a novel-view rendering strategy, combined with large-scale synthetic data generation from single-view images, is employed to learn a feature decoder for robust feature matching, thus achieving generalization across diverse domains. Extensive experiments demonstrate that our method achieves superior generalization across zero-shot evaluation benchmarks, highlighting the effectiveness of the proposed framework for robust feature matching.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes