Modular Sensor Fusion for Semantic Segmentation
This work addresses robustness and data efficiency issues in sensor fusion for robotic perception, though it appears incremental as it builds on existing statistical methods.
The paper tackles the problem of multi-sensor semantic segmentation by analyzing modular statistical fusion approaches that improve robustness and reduce data requirements, achieving up to a 5% IoU gain over single-modality results.
Sensor fusion is a fundamental process in robotic systems as it extends the perceptual range and increases robustness in real-world operations. Current multi-sensor deep learning based semantic segmentation approaches do not provide robustness to under-performing classes in one modality, or require a specific architecture with access to the full aligned multi-sensor training data. In this work, we analyze statistical fusion approaches for semantic segmentation that overcome these drawbacks while keeping a competitive performance. The studied approaches are modular by construction, allowing to have different training sets per modality and only a much smaller subset is needed to calibrate the statistical models. We evaluate a range of statistical fusion approaches and report their performance against state-of-the-art baselines on both real-world and simulated data. In our experiments, the approach improves performance in IoU over the best single modality segmentation results by up to 5%. We make all implementations and configurations publicly available.