CVApr 9, 2025

Distilling Textual Priors from LLM to Efficient Image Fusion

arXiv:2504.07029v31 citationsh-index: 11Has CodeIEEE transactions on circuits and systems for video technology (Print)
Originality Incremental advance
AI Analysis

This work addresses the computational overhead problem in image fusion for researchers and practitioners, offering an incremental improvement in efficiency.

The paper tackles the computational inefficiency of text-guided multi-modality image fusion methods by proposing a distillation framework that transfers large model priors to a smaller student network, achieving a model with 10% of the parameters and inference time while retaining 90% of the teacher's performance and outperforming SOTA methods.

Multi-modality image fusion aims to synthesize a single, comprehensive image from multiple source inputs. Traditional approaches, such as CNNs and GANs, offer efficiency but struggle to handle low-quality or complex inputs. Recent advances in text-guided methods leverage large model priors to overcome these limitations, but at the cost of significant computational overhead, both in memory and inference time. To address this challenge, we propose a novel framework for distilling large model priors, eliminating the need for text guidance during inference while dramatically reducing model size. Our framework utilizes a teacher-student architecture, where the teacher network incorporates large model priors and transfers this knowledge to a smaller student network via a tailored distillation process. Additionally, we introduce spatial-channel cross-fusion module to enhance the model's ability to leverage textual priors across both spatial and channel dimensions. Our method achieves a favorable trade-off between computational efficiency and fusion quality. The distilled network, requiring only 10% of the parameters and inference time of the teacher network, retains 90% of its performance and outperforms existing SOTA methods. Extensive experiments demonstrate the effectiveness of our approach. The implementation will be made publicly available as an open-source resource.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes