CVNov 21, 2024

Stereo Anything: Unifying Zero-shot Stereo Matching with Large-Scale Mixed Data

arXiv:2411.14053v32 citationsh-index: 8Has Code
Originality Incremental advance
AI Analysis

This work addresses the domain generalization challenge in stereo matching for 3D vision applications, offering a scalable data paradigm that is incremental but impactful for improving model robustness.

The paper tackles the problem of stereo matching models degrading in unseen domains due to limited training data diversity, and introduces StereoAnything, a data-centric framework that unifies heterogeneous stereo sources to enhance zero-shot generalization, achieving state-of-the-art results on four public benchmarks.

Stereo matching serves as a cornerstone in 3D vision, aiming to establish pixel-wise correspondences between stereo image pairs for depth recovery. Despite remarkable progress driven by deep neural architectures, current models often exhibit severe performance degradation when deployed in unseen domains, primarily due to the limited diversity of training data. In this work, we introduce StereoAnything, a data-centric framework that substantially enhances the zero-shot generalization capability of existing stereo models. Rather than devising yet another specialized architecture, we scale stereo training to an unprecedented level by systematically unifying heterogeneous stereo sources: (1) curated labeled datasets covering diverse environments, and (2) large-scale synthetic stereo pairs generated from unlabeled monocular images. Our mixed-data strategy delivers consistent and robust learning signals across domains, effectively mitigating dataset bias. Extensive zero-shot evaluations on four public benchmarks demonstrate that Stereo Anything achieves state-of-the-art generalization. This work paves the way towards truly universal stereo matching, offering a scalable data paradigm applicable to any stereo image pair. We extensively evaluate the zero-shot capabilities of our model on four public datasets, showcasing its impressive ability to generalize to any stereo image pair. Code is available at https://github.com/XiandaGuo/OpenStereo.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes