CVMay 3, 2022

End2End Multi-View Feature Matching with Differentiable Pose Optimization

arXiv:2205.01694v334 citationsh-index: 86
Originality Highly original
AI Analysis

This work addresses the challenge of robust and efficient camera pose estimation for computer vision applications, representing an incremental improvement over existing methods like SuperGlue.

The paper tackles the problem of erroneous feature matches impacting camera pose estimation by jointly addressing feature matching and pose optimization, resulting in a 6.7% improvement on ScanNet and 18.5% on Matterport3D compared to SuperGlue, while reducing pose estimation time by over 50% and eliminating the need for RANSAC.

Erroneous feature matches have severe impact on subsequent camera pose estimation and often require additional, time-costly measures, like RANSAC, for outlier rejection. Our method tackles this challenge by addressing feature matching and pose optimization jointly. To this end, we propose a graph attention network to predict image correspondences along with confidence weights. The resulting matches serve as weighted constraints in a differentiable pose estimation. Training feature matching with gradients from pose optimization naturally learns to down-weight outliers and boosts pose estimation on image pairs compared to SuperGlue by 6.7% on ScanNet. At the same time, it reduces the pose estimation time by over 50% and renders RANSAC iterations unnecessary. Moreover, we integrate information from multiple views by spanning the graph across multiple frames to predict the matches all at once. Multi-view matching combined with end-to-end training improves the pose estimation metrics on Matterport3D by 18.5% compared to SuperGlue.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes