SR-Stereo & DAPE: Stepwise Regression and Pre-trained Edges for Practical Stereo Matching
This addresses generalization and domain adaptation challenges for stereo matching in practical applications like autonomous driving, though it appears incremental as it builds on iterative-based methods.
The paper tackles the problem of domain discrepancies in stereo matching by proposing SR-Stereo, a stepwise regression architecture that regresses disparity error through multiple range-controlled clips, and DAPE, a domain adaptation method using pre-trained edges and sparse ground truth. Results show SR-Stereo achieves competitive in-domain and cross-domain performances on datasets like SceneFlow and KITTI, while DAPE significantly improves fine-tuned model performance, especially in texture-less and detailed regions.
Due to the difficulty in obtaining real samples and ground truth, the generalization performance and domain adaptation performance are critical for the feasibility of stereo matching methods in practical applications. However, there are significant distributional discrepancies among different domains, which pose challenges for generalization and domain adaptation of the model. Inspired by the iteration-based methods, we propose a novel stepwise regression architecture. This architecture regresses the disparity error through multiple range-controlled clips, which effectively overcomes domain discrepancies. We implement this architecture based on the iterative-based methods, and refer to this new stereo method as SR-Stereo. Specifically, a new stepwise regression unit is proposed to replace the original update unit in order to control the range of output. Meanwhile, a regression objective segment is proposed to set the supervision individually for each stepwise regression unit. In addition, to enhance the edge awareness of models adapting new domains with sparse ground truth, we propose Domain Adaptation based on Pre-trained Edges (DAPE). In DAPE, a pre-trained stereo model and an edge estimator are used to estimate the edge maps of the target domain images, which along with the sparse ground truth disparity are used to fine-tune the stereo model. The proposed SR-Stereo and DAPE are extensively evaluated on SceneFlow, KITTI, Middbury 2014 and ETH3D. Compared with the SOTA methods and generalized methods, the proposed SR-Stereo achieves competitive in-domain and cross-domain performances. Meanwhile, the proposed DAPE significantly improves the performance of the fine-tuned model, especially in the texture-less and detailed regions.