CVOct 9, 2025

RetouchLLM: Training-free Code-based Image Retouching with Vision Language Models

arXiv:2510.08054v24 citationsh-index: 7
AI Analysis

This work addresses the need for adaptable and transparent image retouching tools for users seeking personalized control, though it is incremental in applying existing vision language models to a specific domain.

The authors tackled the problem of opaque and data-intensive image retouching by proposing RetouchLLM, a training-free system that uses vision language models to perform interpretable, code-based adjustments on high-resolution images, enabling diverse and user-specific enhancements without requiring large-scale paired data.

Image retouching not only enhances visual quality but also serves as a means of expressing personal preferences and emotions. However, existing learning-based approaches require large-scale paired data and operate as black boxes, making the retouching process opaque and limiting their adaptability to handle diverse, user- or image-specific adjustments. In this work, we propose RetouchLLM, a training-free white-box image retouching system, which requires no training data and performs interpretable, code-based retouching directly on high-resolution images. Our framework progressively enhances the image in a manner similar to how humans perform multi-step retouching, allowing exploration of diverse adjustment paths. It comprises of two main modules: a visual critic that identifies differences between the input and reference images, and a code generator that produces executable codes. Experiments demonstrate that our approach generalizes well across diverse retouching styles, while natural language-based user interaction enables interpretable and controllable adjustments tailored to user intent.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes