CVSep 10, 2025

Handling Multiple Hypotheses in Coarse-to-Fine Dense Image Matching

arXiv:2509.08805v2h-index: 24ICIP
Originality Incremental advance
AI Analysis

This addresses a specific challenge in computer vision for applications like 3D reconstruction, though it appears incremental as it builds on existing coarse-to-fine mechanisms.

The paper tackles the problem of erroneous matches in dense image matching at depth discontinuities or strong zoom-ins by predicting multiple correspondent hypotheses per source location at each scale, resulting in BEAMER, which is significantly more robust than state-of-the-art methods.

Dense image matching aims to find a correspondent for every pixel of a source image in a partially overlapping target image. State-of-the-art methods typically rely on a coarse-to-fine mechanism where a single correspondent hypothesis is produced per source location at each scale. In challenging cases -- such as at depth discontinuities or when the target image is a strong zoom-in of the source image -- the correspondents of neighboring source locations are often widely spread and predicting a single correspondent hypothesis per source location at each scale may lead to erroneous matches. In this paper, we investigate the idea of predicting multiple correspondent hypotheses per source location at each scale instead. We consider a beam search strategy to propagat multiple hypotheses at each scale and propose integrating these multiple hypotheses into cross-attention layers, resulting in a novel dense matching architecture called BEAMER. BEAMER learns to preserve and propagate multiple hypotheses across scales, making it significantly more robust than state-of-the-art methods, especially at depth discontinuities or when the target image is a strong zoom-in of the source image.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes