CVJun 27, 2023

Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching

arXiv:2306.15612v230 citationsh-index: 16
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in stereo matching for computer vision applications, offering an incremental improvement over existing loss functions.

The paper tackles the challenge of accurate disparity map recovery in stereo matching by proposing an adaptive multi-modal cross-entropy loss (ADL) that models multi-modal ground-truth distributions for edge pixels, resulting in state-of-the-art performance with GANet ranking 1st on KITTI 2015 and 2012 benchmarks.

Despite the great success of deep learning in stereo matching, recovering accurate disparity maps is still challenging. Currently, L1 and cross-entropy are the two most widely used losses for stereo network training. Compared with the former, the latter usually performs better thanks to its probability modeling and direct supervision to the cost volume. However, how to accurately model the stereo ground-truth for cross-entropy loss remains largely under-explored. Existing works simply assume that the ground-truth distributions are uni-modal, which ignores the fact that most of the edge pixels can be multi-modal. In this paper, a novel adaptive multi-modal cross-entropy loss (ADL) is proposed to guide the networks to learn different distribution patterns for each pixel. Moreover, we optimize the disparity estimator to further alleviate the bleeding or misalignment artifacts in inference. Extensive experimental results show that our method is generic and can help classic stereo networks regain state-of-the-art performance. In particular, GANet with our method ranks $1^{st}$ on both the KITTI 2015 and 2012 benchmarks among the published methods. Meanwhile, excellent synthetic-to-realistic generalization performance can be achieved by simply replacing the traditional loss with ours.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes