RetouchLLM: Training-free Code-based Image Retouching with Vision Language Models
This work addresses the need for adaptable and transparent image retouching tools for users seeking personalized control, though it is incremental in applying existing vision language models to a specific domain.
The authors tackled the problem of opaque and data-intensive image retouching by proposing RetouchLLM, a training-free system that uses vision language models to perform interpretable, code-based adjustments on high-resolution images, enabling diverse and user-specific enhancements without requiring large-scale paired data.
Image retouching not only enhances visual quality but also serves as a means of expressing personal preferences and emotions. However, existing learning-based approaches require large-scale paired data and operate as black boxes, making the retouching process opaque and limiting their adaptability to handle diverse, user- or image-specific adjustments. In this work, we propose RetouchLLM, a training-free white-box image retouching system, which requires no training data and performs interpretable, code-based retouching directly on high-resolution images. Our framework progressively enhances the image in a manner similar to how humans perform multi-step retouching, allowing exploration of diverse adjustment paths. It comprises of two main modules: a visual critic that identifies differences between the input and reference images, and a code generator that produces executable codes. Experiments demonstrate that our approach generalizes well across diverse retouching styles, while natural language-based user interaction enables interpretable and controllable adjustments tailored to user intent.