CVNov 21, 2024

Stereo Anything: Unifying Zero-shot Stereo Matching with Large-Scale Mixed Data

Xianda Guo, Chenming Zhang, Youmin Zhang, Ruilin Wang, Dujun Nie, Wenzhao Zheng, Matteo Poggi, Hao Zhao, Mang Ye, Qin Zou, Long Chen

arXiv:2411.14053v36.52 citationsh-index: 8Has Code

Originality Incremental advance

AI Analysis

This work addresses the domain generalization challenge in stereo matching for 3D vision applications, offering a scalable data paradigm that is incremental but impactful for improving model robustness.

The paper tackles the problem of stereo matching models degrading in unseen domains due to limited training data diversity, and introduces StereoAnything, a data-centric framework that unifies heterogeneous stereo sources to enhance zero-shot generalization, achieving state-of-the-art results on four public benchmarks.

Stereo matching serves as a cornerstone in 3D vision, aiming to establish pixel-wise correspondences between stereo image pairs for depth recovery. Despite remarkable progress driven by deep neural architectures, current models often exhibit severe performance degradation when deployed in unseen domains, primarily due to the limited diversity of training data. In this work, we introduce StereoAnything, a data-centric framework that substantially enhances the zero-shot generalization capability of existing stereo models. Rather than devising yet another specialized architecture, we scale stereo training to an unprecedented level by systematically unifying heterogeneous stereo sources: (1) curated labeled datasets covering diverse environments, and (2) large-scale synthetic stereo pairs generated from unlabeled monocular images. Our mixed-data strategy delivers consistent and robust learning signals across domains, effectively mitigating dataset bias. Extensive zero-shot evaluations on four public benchmarks demonstrate that Stereo Anything achieves state-of-the-art generalization. This work paves the way towards truly universal stereo matching, offering a scalable data paradigm applicable to any stereo image pair. We extensively evaluate the zero-shot capabilities of our model on four public datasets, showcasing its impressive ability to generalize to any stereo image pair. Code is available at https://github.com/XiandaGuo/OpenStereo.

View on arXiv PDF Code

Similar