CVJun 11, 2024

A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion

Xiaoli Zhang, Liying Wang, Libo Zhao, Xiongfei Li, Siwei Ma

arXiv:2407.06159v33.71 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the neglect of inter-feature relationships and high-frequency information loss in image fusion, which is important for applications like object detection and semantic segmentation, but it appears incremental as it builds on existing encoder-decoder architectures with novel modules.

The paper tackles the problem of multi-modality image fusion by modeling correlation-driven decomposing features and reasoning high-level graph representation to efficiently extract complementary information, achieving competitive results in visible/infrared and medical image fusion tasks. It surpasses state-of-the-art methods in downstream tasks, with an average 8.27% higher mAP@0.5 in object detection and 5.85% higher mIoU in semantic segmentation.

Multi-modality image fusion aims at fusing modality-specific (complementarity) and modality-shared (correlation) information from multiple source images. To tackle the problem of the neglect of inter-feature relationships, high-frequency information loss, and the limited attention to downstream tasks, this paper focuses on how to model correlation-driven decomposing features and reason high-level graph representation by efficiently extracting complementary information and aggregating multi-guided features. We propose a three-branch encoder-decoder architecture along with corresponding fusion layers as the fusion strategy. Firstly, shallow features from individual modalities are extracted by a depthwise convolution layer combined with the transformer block. In the three parallel branches of the encoder, Cross Attention and Invertible Block (CAI) extracts local features and preserves high-frequency texture details. Base Feature Extraction Module (BFE) captures long-range dependencies and enhances modality-shared information. Graph Reasoning Module (GR) is introduced to reason high-level cross-modality relations and simultaneously extract low-level detail features as CAI's modality-specific complementary information. Experiments demonstrate the competitive results compared with state-of-the-art methods in visible/infrared image fusion and medical image fusion tasks. Moreover, the proposed algorithm surpasses the state-of-the-art methods in terms of subsequent tasks, averagely scoring 8.27% mAP@0.5 higher in object detection and 5.85% mIoU higher in semantic segmentation. The code is avaliable at https://github.com/Abraham-Einstein/SMFNet/.

View on arXiv PDF Code

Similar