On the Overconfidence Problem in Semantic 3D Mapping
This addresses a critical issue for real-time robotic perception and navigation, improving downstream tasks like ObjectNav, though it is incremental as it builds on existing fusion methods.
The paper tackles the overconfidence problem in semantic 3D mapping, where conventional methods assign high confidence to incorrect maps, and proposes a learned pipeline (GLFS) that achieves higher accuracy and better calibration while maintaining real-time performance on the ScanNet dataset.
Semantic 3D mapping, the process of fusing depth and image segmentation information between multiple views to build 3D maps annotated with object classes in real-time, is a recent topic of interest. This paper highlights the fusion overconfidence problem, in which conventional mapping methods assign high confidence to the entire map even when they are incorrect, leading to miscalibrated outputs. Several methods to improve uncertainty calibration at different stages in the fusion pipeline are presented and compared on the ScanNet dataset. We show that the most widely used Bayesian fusion strategy is among the worst calibrated, and propose a learned pipeline that combines fusion and calibration, GLFS, which achieves simultaneously higher accuracy and 3D map calibration while retaining real-time capability. We further illustrate the importance of map calibration on a downstream task by showing that incorporating proper semantic fusion on a modular ObjectNav agent improves its success rates. Our code will be provided on Github for reproducibility upon acceptance.