CVDec 13, 2024

All-in-One: Transferring Vision Foundation Models into Stereo Matching

arXiv:2412.09912v113 citationsh-index: 12
Originality Incremental advance
AI Analysis

This work addresses a domain-specific problem in computer vision by incrementally advancing stereo matching through better feature utilization from VFMs.

The paper tackles the problem of improving stereo matching by transferring knowledge from multiple vision foundation models (VFMs) to enhance feature extraction, resulting in state-of-the-art performance with top rankings on the Middlebury and ETH3D benchmarks.

As a fundamental vision task, stereo matching has made remarkable progress. While recent iterative optimization-based methods have achieved promising performance, their feature extraction capabilities still have room for improvement. Inspired by the ability of vision foundation models (VFMs) to extract general representations, in this work, we propose AIO-Stereo which can flexibly select and transfer knowledge from multiple heterogeneous VFMs to a single stereo matching model. To better reconcile features between heterogeneous VFMs and the stereo matching model and fully exploit prior knowledge from VFMs, we proposed a dual-level feature utilization mechanism that aligns heterogeneous features and transfers multi-level knowledge. Based on the mechanism, a dual-level selective knowledge transfer module is designed to selectively transfer knowledge and integrate the advantages of multiple VFMs. Experimental results show that AIO-Stereo achieves start-of-the-art performance on multiple datasets and ranks $1^{st}$ on the Middlebury dataset and outperforms all the published work on the ETH3D benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes