CVDec 21, 2025

SimpleCall: A Lightweight Image Restoration Agent in Label-Free Environments with MLLM Perceptual Feedback

arXiv:2512.18599v12 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses efficiency and annotation limitations in image restoration for applications requiring fast, label-free processing, though it is incremental as it builds on existing agent and MLLM frameworks.

The paper tackles complex image restoration from multiple degradations by proposing a lightweight agent that learns tool-calling sequences via policy optimization, using MLLM perceptual feedback for training without labels. The method matches state-of-the-art performance on full-reference metrics and surpasses existing approaches on no-reference metrics across diverse degradation scenarios.

Complex image restoration aims to recover high-quality images from inputs affected by multiple degradations such as blur, noise, rain, and compression artifacts. Recent restoration agents, powered by vision-language models and large language models, offer promising restoration capabilities but suffer from significant efficiency bottlenecks due to reflection, rollback, and iterative tool searching. Moreover, their performance heavily depends on degradation recognition models that require extensive annotations for training, limiting their applicability in label-free environments. To address these limitations, we propose a policy optimization-based restoration framework that learns an lightweight agent to determine tool-calling sequences. The agent operates in a sequential decision process, selecting the most appropriate restoration operation at each step to maximize final image quality. To enable training within label-free environments, we introduce a novel reward mechanism driven by multimodal large language models, which act as human-aligned evaluator and provide perceptual feedback for policy improvement. Once trained, our agent executes a deterministic restoration plans without redundant tool invocations, significantly accelerating inference while maintaining high restoration quality. Extensive experiments show that despite using no supervision, our method matches SOTA performance on full-reference metrics and surpasses existing approaches on no-reference metrics across diverse degradation scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes