CVAILGJun 11, 2025

EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits

Apple
arXiv:2506.09988v13 citationsh-index: 19ACL
Originality Incremental advance
AI Analysis

This addresses the problem of evaluating text-guided image edits for researchers and practitioners in generative AI, but it is incremental as it builds on existing benchmarks and methods.

The paper tackles the lack of a comprehensive framework for verifying text-guided image edits by introducing EditInspector, a benchmark based on human annotations, and finds that current models struggle with evaluation and hallucinate, proposing two methods that outperform state-of-the-art models in artifact detection and difference caption generation.

Text-guided image editing, fueled by recent advancements in generative AI, is becoming increasingly widespread. This trend highlights the need for a comprehensive framework to verify text-guided edits and assess their quality. To address this need, we introduce EditInspector, a novel benchmark for evaluation of text-guided image edits, based on human annotations collected using an extensive template for edit verification. We leverage EditInspector to evaluate the performance of state-of-the-art (SoTA) vision and language models in assessing edits across various dimensions, including accuracy, artifact detection, visual quality, seamless integration with the image scene, adherence to common sense, and the ability to describe edit-induced changes. Our findings indicate that current models struggle to evaluate edits comprehensively and frequently hallucinate when describing the changes. To address these challenges, we propose two novel methods that outperform SoTA models in both artifact detection and difference caption generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes