CVMay 12

Leveraging Multimodal Large Language Models for All-in-One Image Restoration via a Mixture of Frequency Experts

arXiv:2605.114448.1

Predicted impact top 71% in CV · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the challenge of modeling continuous relational structure in composite degradations for all-in-one image restoration, which is important for practical applications requiring a unified restoration framework.

The paper proposes a multimodal large language model (MLLM)-guided framework for all-in-one image restoration, using MLLM-derived features to enhance degradation-aware representations and a mixture-of-frequency-experts module for adaptive frequency combination. The method achieves state-of-the-art performance on the CDD11 dataset, outperforming previous methods by up to 1.35 dB.

All-in-one image restoration seeks to recover clean images from inputs affected by diverse and unknown degradations using a unified framework. Recent methods have shown strong performance by identifying degradation characteristics to guide the restoration process. However, many of them treat degradations as discrete categories, which limits their ability to model the continuous relational structure that arises in composite degradations. To address this issue, we propose a multimodal large language model (MLLM)-guided image restoration framework that exploits multimodal embeddings as guidance for low-level restoration. Specifically, MLLM-derived features are injected into an encoder-decoder architecture through an MLLM-guided fusion block (MGFB) to enhance degradation-aware representations. In addition, we incorporate a mixture-of-frequency-experts (MoFE) module that adaptively combines frequency experts using MLLM-guided contextual cues. To further improve expert routing, we design an MLLM-guided router with a relational alignment loss that encourages routing patterns consistent with the embedding-space relationships of degraded inputs. Extensive experiments on multiple benchmarks show that the proposed method achieves strong performance across diverse restoration settings and establishes a new state of the art on the challenging CDD11 dataset, outperforming previous methods by up to 1.35 dB.

View on arXiv PDF

Similar