CV LGAug 25, 2025

SpotEdit: Evaluating Visually-Guided Image Editing Methods

Sara Ghazanfari, Wei-An Lin, Haitong Tian, Ersin Yumer

arXiv:2508.18159v26.22 citationsh-index: 29Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for better evaluation benchmarks in image editing for researchers and practitioners, though it is incremental as it builds on existing methods.

The authors tackled the problem of insufficient evaluation for visually-guided image editing methods by introducing SpotEdit, a comprehensive benchmark that systematically assesses diverse generative models, revealing significant performance disparities and highlighting issues like hallucination in leading models such as GPT-4o.

Visually-guided image editing, where edits are conditioned on both visual cues and textual prompts, has emerged as a powerful paradigm for fine-grained, controllable content generation. Although recent generative models have shown remarkable capabilities, existing evaluations remain simple and insufficiently representative of real-world editing challenges. We present SpotEdit, a comprehensive benchmark designed to systematically assess visually-guided image editing methods across diverse diffusion, autoregressive, and hybrid generative models, uncovering substantial performance disparities. To address a critical yet underexplored challenge, our benchmark includes a dedicated component on hallucination, highlighting how leading models, such as GPT-4o, often hallucinate the existence of a visual cue and erroneously perform the editing task. Our code and benchmark are publicly released at https://github.com/SaraGhazanfari/SpotEdit.

View on arXiv PDF Code

Similar