CVLGSep 14, 2025

Modality-Aware Infrared and Visible Image Fusion with Target-Aware Supervision

arXiv:2509.11476v14 citationsh-index: 62025 6th International Conference on Computer Vision and Data Mining (ICCVDM)
Originality Incremental advance
AI Analysis

This work addresses multi-modal perception for applications like object detection and scene understanding, but it is incremental as it builds on existing fusion methods with specific enhancements.

The paper tackled the problem of infrared and visible image fusion by proposing FusionNet, which integrates complementary cues with a modality-aware attention mechanism and target-aware supervision, resulting in enhanced semantic preservation and perceptual quality on the M3FD dataset.

Infrared and visible image fusion (IVIF) is a fundamental task in multi-modal perception that aims to integrate complementary structural and textural cues from different spectral domains. In this paper, we propose FusionNet, a novel end-to-end fusion framework that explicitly models inter-modality interaction and enhances task-critical regions. FusionNet introduces a modality-aware attention mechanism that dynamically adjusts the contribution of infrared and visible features based on their discriminative capacity. To achieve fine-grained, interpretable fusion, we further incorporate a pixel-wise alpha blending module, which learns spatially-varying fusion weights in an adaptive and content-aware manner. Moreover, we formulate a target-aware loss that leverages weak ROI supervision to preserve semantic consistency in regions containing important objects (e.g., pedestrians, vehicles). Experiments on the public M3FD dataset demonstrate that FusionNet generates fused images with enhanced semantic preservation, high perceptual quality, and clear interpretability. Our framework provides a general and extensible solution for semantic-aware multi-modal image fusion, with benefits for downstream tasks such as object detection and scene understanding.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes