CVSep 5, 2025

Dual-Domain Perspective on Degradation-Aware Fusion: A VLM-Guided Robust Infrared and Visible Image Fusion Framework

arXiv:2509.05000v11 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses a domain-specific problem for image fusion applications, offering a robust solution to handle degraded inputs without manual pre-enhancement steps, though it appears incremental as it builds on existing fusion methods with novel integration of VLMs.

The paper tackles the problem of infrared-visible image fusion in dual-source degraded scenarios, where existing methods struggle due to assumptions of high-quality inputs, and proposes GD^2Fusion, a framework integrating vision-language models for degradation perception with dual-domain optimization, achieving superior fusion performance compared to existing algorithms.

Most existing infrared-visible image fusion (IVIF) methods assume high-quality inputs, and therefore struggle to handle dual-source degraded scenarios, typically requiring manual selection and sequential application of multiple pre-enhancement steps. This decoupled pre-enhancement-to-fusion pipeline inevitably leads to error accumulation and performance degradation. To overcome these limitations, we propose Guided Dual-Domain Fusion (GD^2Fusion), a novel framework that synergistically integrates vision-language models (VLMs) for degradation perception with dual-domain (frequency/spatial) joint optimization. Concretely, the designed Guided Frequency Modality-Specific Extraction (GFMSE) module performs frequency-domain degradation perception and suppression and discriminatively extracts fusion-relevant sub-band features. Meanwhile, the Guided Spatial Modality-Aggregated Fusion (GSMAF) module carries out cross-modal degradation filtering and adaptive multi-source feature aggregation in the spatial domain to enhance modality complementarity and structural consistency. Extensive qualitative and quantitative experiments demonstrate that GD^2Fusion achieves superior fusion performance compared with existing algorithms and strategies in dual-source degraded scenarios. The code will be publicly released after acceptance of this paper.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes