CVJun 20, 2025

TeSG: Textual Semantic Guidance for Infrared and Visible Image Fusion

arXiv:2506.16730v12 citationsh-index: 14
Originality Incremental advance
AI Analysis

This addresses the challenge of insufficient textual guidance in IVF for applications such as surveillance and autonomous systems, representing an incremental improvement over existing methods.

The paper tackles the problem of effectively integrating textual semantic information in infrared and visible image fusion (IVF) by proposing TeSG, which uses mask and text semantics from vision-language models to guide the fusion process, resulting in competitive performance on downstream tasks like detection and segmentation.

Infrared and visible image fusion (IVF) aims to combine complementary information from both image modalities, producing more informative and comprehensive outputs. Recently, text-guided IVF has shown great potential due to its flexibility and versatility. However, the effective integration and utilization of textual semantic information remains insufficiently studied. To tackle these challenges, we introduce textual semantics at two levels: the mask semantic level and the text semantic level, both derived from textual descriptions extracted by large Vision-Language Models (VLMs). Building on this, we propose Textual Semantic Guidance for infrared and visible image fusion, termed TeSG, which guides the image synthesis process in a way that is optimized for downstream tasks such as detection and segmentation. Specifically, TeSG consists of three core components: a Semantic Information Generator (SIG), a Mask-Guided Cross-Attention (MGCA) module, and a Text-Driven Attentional Fusion (TDAF) module. The SIG generates mask and text semantics based on textual descriptions. The MGCA module performs initial attention-based fusion of visual features from both infrared and visible images, guided by mask semantics. Finally, the TDAF module refines the fusion process with gated attention driven by text semantics. Extensive experiments demonstrate the competitiveness of our approach, particularly in terms of performance on downstream tasks, compared to existing state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes