CVAILGROMar 30, 2025

Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model

arXiv:2503.23502v31 citationsh-index: 6IROS
Originality Incremental advance
AI Analysis

This work addresses depth perception challenges for mobile robotics applications, representing an incremental improvement over existing methods.

The paper tackles the problem of limited depth accuracy in omnidirectional stereo matching for mobile robotics by leveraging a pre-trained depth foundation model, achieving a 16% reduction in disparity MAE on the Helvipad dataset.

Omnidirectional depth perception is essential for mobile robotics applications that require scene understanding across a full 360° field of view. Camera-based setups offer a cost-effective option by using stereo depth estimation to generate dense, high-resolution depth maps without relying on expensive active sensing. However, existing omnidirectional stereo matching approaches achieve only limited depth accuracy across diverse environments, depth ranges, and lighting conditions, due to the scarcity of real-world data. We present DFI-OmniStereo, a novel omnidirectional stereo matching method that leverages a large-scale pre-trained foundation model for relative monocular depth estimation within an iterative optimization-based stereo matching architecture. We introduce a dedicated two-stage training strategy to utilize the relative monocular depth features for our omnidirectional stereo matching before scale-invariant fine-tuning. DFI-OmniStereo achieves state-of-the-art results on the real-world Helvipad dataset, reducing disparity MAE by approximately 16% compared to the previous best omnidirectional stereo method.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes