Matching-space Stereo Networks for Cross-domain Generalization
This addresses the domain shift issue in stereo matching for computer vision applications, offering an incremental improvement by modifying network architecture to enhance generalization without sacrificing source performance.
The paper tackles the problem of poor cross-domain generalization in stereo matching networks by introducing Matching-Space Networks (MS-Nets), which replace learning-based feature extraction with matching functions and confidence measures to avoid domain-specific over-specialization, resulting in superior generalization to unseen environments while maintaining accuracy on the source domain.
End-to-end deep networks represent the state of the art for stereo matching. While excelling on images framing environments similar to the training set, major drops in accuracy occur in unseen domains (e.g., when moving from synthetic to real scenes). In this paper we introduce a novel family of architectures, namely Matching-Space Networks (MS-Nets), with improved generalization properties. By replacing learning-based feature extraction from image RGB values with matching functions and confidence measures from conventional wisdom, we move the learning process from the color space to the Matching Space, avoiding over-specialization to domain specific features. Extensive experimental results on four real datasets highlight that our proposal leads to superior generalization to unseen environments over conventional deep architectures, keeping accuracy on the source domain almost unaltered. Our code is available at https://github.com/ccj5351/MS-Nets.