CVMar 30, 2024

Reusable Architecture Growth for Continual Stereo Matching

arXiv:2404.00360v15 citationsh-index: 59IEEE Trans Pattern Anal Mach Intell
Originality Incremental advance
AI Analysis

This addresses the challenge of practical deployment in stereo matching for applications like autonomous driving, where training data arrives continuously, but it is incremental as it builds on existing neural architecture search and continual learning techniques.

The paper tackles the problem of continual learning for stereo depth estimation, where models must learn new scenes without forgetting previous ones, and demonstrates that their Reusable Architecture Growth (RAG) framework surpasses state-of-the-art methods in cross-dataset settings.

The remarkable performance of recent stereo depth estimation models benefits from the successful use of convolutional neural networks to regress dense disparity. Akin to most tasks, this needs gathering training data that covers a number of heterogeneous scenes at deployment time. However, training samples are typically acquired continuously in practical applications, making the capability to learn new scenes continually even more crucial. For this purpose, we propose to perform continual stereo matching where a model is tasked to 1) continually learn new scenes, 2) overcome forgetting previously learned scenes, and 3) continuously predict disparities at inference. We achieve this goal by introducing a Reusable Architecture Growth (RAG) framework. RAG leverages task-specific neural unit search and architecture growth to learn new scenes continually in both supervised and self-supervised manners. It can maintain high reusability during growth by reusing previous units while obtaining good performance. Additionally, we present a Scene Router module to adaptively select the scene-specific architecture path at inference. Comprehensive experiments on numerous datasets show that our framework performs impressively in various weather, road, and city circumstances and surpasses the state-of-the-art methods in more challenging cross-dataset settings. Further experiments also demonstrate the adaptability of our method to unseen scenes, which can facilitate end-to-end stereo architecture learning and practical deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes