CVMay 24, 2023

RoMa: Robust Dense Feature Matching

arXiv:2305.15404v2344 citationsHas Code
Originality Highly original
AI Analysis

This work addresses robust feature matching for computer vision applications, representing a strong incremental advance with specific gains.

The paper tackles the problem of robust dense feature matching between images under challenging real-world changes by proposing RoMa, which combines frozen pretrained DINOv2 features with specialized ConvNet fine features and a transformer match decoder, achieving a 36% improvement on the WxBS benchmark.

Feature matching is an important computer vision task that involves estimating correspondences between two images of a 3D scene, and dense methods estimate all such correspondences. The aim is to learn a robust model, i.e., a model able to match under challenging real-world changes. In this work, we propose such a model, leveraging frozen pretrained features from the foundation model DINOv2. Although these features are significantly more robust than local features trained from scratch, they are inherently coarse. We therefore combine them with specialized ConvNet fine features, creating a precisely localizable feature pyramid. To further improve robustness, we propose a tailored transformer match decoder that predicts anchor probabilities, which enables it to express multimodality. Finally, we propose an improved loss formulation through regression-by-classification with subsequent robust regression. We conduct a comprehensive set of experiments that show that our method, RoMa, achieves significant gains, setting a new state-of-the-art. In particular, we achieve a 36% improvement on the extremely challenging WxBS benchmark. Code is provided at https://github.com/Parskatt/RoMa

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes