CVFeb 19, 2024

Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios

arXiv:2402.11826v16 citationsh-index: 19ICRA
Originality Incremental advance
AI Analysis

This work addresses depth estimation in difficult scenarios for applications in 3D vision, but it is incremental as it builds on existing multi-modal approaches.

The paper tackles the problem of monocular depth estimation deteriorating in challenging environments like nighttime or adverse weather by proposing a multi-modal fusion framework that integrates RGB and long-wave infrared images, achieving robust depth estimation as demonstrated on MS^2 and ViViD++ datasets.

Monocular depth estimation from RGB images plays a pivotal role in 3D vision. However, its accuracy can deteriorate in challenging environments such as nighttime or adverse weather conditions. While long-wave infrared cameras offer stable imaging in such challenging conditions, they are inherently low-resolution, lacking rich texture and semantics as delivered by the RGB image. Current methods focus solely on a single modality due to the difficulties to identify and integrate faithful depth cues from both sources. To address these issues, this paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework. Concretely, we independently compute the coarse depth maps with separate networks by fully utilizing the individual depth cues from each modality. As the advantageous depth spreads across both modalities, we propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas. With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner. Harnessing the proposed pipeline, our method demonstrates the ability of robust depth estimation in a variety of difficult scenarios. Experimental results on the challenging MS$^2$ and ViViD++ datasets demonstrate the effectiveness and robustness of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes