CVNov 20, 2025

Lite Any Stereo: Efficient Zero-Shot Stereo Matching

arXiv:2511.16555v1h-index: 54
Originality Incremental advance
AI Analysis

This work addresses the need for efficient and generalizable stereo depth estimation models, which is crucial for real-world applications like robotics and autonomous driving, though it is incremental in improving existing methods.

The paper tackles the problem of balancing accuracy and efficiency in stereo matching by introducing Lite Any Stereo, a framework that achieves strong zero-shot generalization with high efficiency, ranking 1st on four real-world benchmarks while using less than 1% computational cost compared to state-of-the-art methods.

Recent advances in stereo matching have focused on accuracy, often at the cost of significantly increased model size. Traditionally, the community has regarded efficient models as incapable of zero-shot ability due to their limited capacity. In this paper, we introduce Lite Any Stereo, a stereo depth estimation framework that achieves strong zero-shot generalization while remaining highly efficient. To this end, we design a compact yet expressive backbone to ensure scalability, along with a carefully crafted hybrid cost aggregation module. We further propose a three-stage training strategy on million-scale data to effectively bridge the sim-to-real gap. Together, these components demonstrate that an ultra-light model can deliver strong generalization, ranking 1st across four widely used real-world benchmarks. Remarkably, our model attains accuracy comparable to or exceeding state-of-the-art non-prior-based accurate methods while requiring less than 1% computational cost, setting a new standard for efficient stereo matching.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes