CVDec 5, 2024

Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail

arXiv:2412.04472v20.0746 citationsh-index: 43CVPR
AI Analysis85

This addresses the problem of stereo matching in textureless or reflective scenes for computer vision applications, representing a novel hybrid approach rather than an incremental improvement.

The paper tackles the problem of robust zero-shot stereo matching in challenging conditions by combining geometric constraints with monocular depth priors, achieving state-of-the-art results on benchmarks and showing robustness to mirrors and transparencies.

We introduce Stereo Anywhere, a novel stereo-matching framework that combines geometric constraints with robust priors from monocular depth Vision Foundation Models (VFMs). By elegantly coupling these complementary worlds through a dual-branch architecture, we seamlessly integrate stereo matching with learned contextual cues. Following this design, our framework introduces novel cost volume fusion mechanisms that effectively handle critical challenges such as textureless regions, occlusions, and non-Lambertian surfaces. Through our novel optical illusion dataset, MonoTrap, and extensive evaluation across multiple benchmarks, we demonstrate that our synthetic-only trained model achieves state-of-the-art results in zero-shot generalization, significantly outperforming existing solutions while showing remarkable robustness to challenging cases such as mirrors and transparencies.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes