Active-Passive SimStereo -- Benchmarking the Cross-Generalization Capabilities of Deep Learning-based Stereo Methods
This work addresses a potential performance gap in stereo vision for researchers and practitioners, but it is incremental as it focuses on benchmarking and analysis rather than proposing new methods.
The paper tackles the problem of whether active stereo patterns negatively impact deep learning-based stereo matching methods by introducing the Active-Passive SimStereo dataset and benchmark. The result shows that feature extraction and matching modules generalize well, but disparity refinement modules in three out of twenty architectures are negatively affected due to reliance on image appearance.
In stereo vision, self-similar or bland regions can make it difficult to match patches between two images. Active stereo-based methods mitigate this problem by projecting a pseudo-random pattern on the scene so that each patch of an image pair can be identified without ambiguity. However, the projected pattern significantly alters the appearance of the image. If this pattern acts as a form of adversarial noise, it could negatively impact the performance of deep learning-based methods, which are now the de-facto standard for dense stereo vision. In this paper, we propose the Active-Passive SimStereo dataset and a corresponding benchmark to evaluate the performance gap between passive and active stereo images for stereo matching algorithms. Using the proposed benchmark and an additional ablation study, we show that the feature extraction and matching modules of a selection of twenty selected deep learning-based stereo matching methods generalize to active stereo without a problem. However, the disparity refinement modules of three of the twenty architectures (ACVNet, CascadeStereo, and StereoNet) are negatively affected by the active stereo patterns due to their reliance on the appearance of the input images.