CVAug 4, 2025

Beyond RGB and Events: Enhancing Object Detection under Adverse Lighting with Monocular Normal Maps

arXiv:2508.02127v18.41 citationsh-index: 31

Originality Incremental advance

AI Analysis

This addresses a critical problem for autonomous driving systems by enhancing detection accuracy in challenging lighting scenarios, though it is incremental as it builds on existing multi-modal fusion approaches.

The paper tackles object detection under adverse lighting conditions, such as in autonomous driving, by proposing NRE-Net, a multi-modal framework that fuses monocular normal maps, RGB images, and event streams to suppress false positives, achieving mAP50 improvements of up to 7.9% over state-of-the-art methods.

Accurate object detection under adverse lighting conditions is critical for real-world applications such as autonomous driving. Although neuromorphic event cameras have been introduced to handle these scenarios, adverse lighting often induces distracting reflections from tunnel walls or road surfaces, which frequently lead to false obstacle detections. However, neither RGB nor event data alone is robust enough to address these complexities, and mitigating these issues without additional sensors remains underexplored. To overcome these challenges, we propose leveraging normal maps, directly predicted from monocular RGB images, as robust geometric cues to suppress false positives and enhance detection accuracy. We introduce NRE-Net, a novel multi-modal detection framework that effectively fuses three complementary modalities: monocularly predicted surface normal maps, RGB images, and event streams. To optimize the fusion process, our framework incorporates two key modules: the Adaptive Dual-stream Fusion Module (ADFM), which integrates RGB and normal map features, and the Event-modality Aware Fusion Module (EAFM), which adapts to the high dynamic range characteristics of event data. Extensive evaluations on the DSEC-Det-sub and PKU-DAVIS-SOD datasets demonstrate that NRE-Net significantly outperforms state-of-the-art methods. Our approach achieves mAP50 improvements of 7.9% and 6.1% over frame-based approaches (e.g., YOLOX), while surpassing the fusion-based SFNet by 2.7% on the DSEC-Det-sub dataset and SODFormer by 7.1% on the PKU-DAVIS-SOD dataset.

View on arXiv PDF

Similar