I2I-Bench: A Comprehensive Benchmark Suite for Image-to-Image Editing Models
This addresses the problem of limited and manual evaluation for researchers and practitioners in image editing, though it is incremental as it builds on existing benchmarking efforts.
The authors tackled the challenge of evaluating image-to-image editing models by proposing I2I-Bench, a comprehensive benchmark suite that includes 10 task categories and 30 evaluation dimensions with automated methods, and they used it to benchmark mainstream models to identify gaps and trade-offs.
Image editing models are advancing rapidly, yet comprehensive evaluation remains a significant challenge. Existing image editing benchmarks generally suffer from limited task scopes, insufficient evaluation dimensions, and heavy reliance on manual annotations, which significantly constrain their scalability and practical applicability. To address this, we propose \textbf{I2I-Bench}, a comprehensive benchmark for image-to-image editing models, which features (i) diverse tasks, encompassing 10 task categories across both single-image and multi-image editing tasks, (ii) comprehensive evaluation dimensions, including 30 decoupled and fine-grained evaluation dimensions with automated hybrid evaluation methods that integrate specialized tools and large multimodal models (LMMs), and (iii) rigorous alignment validation, justifying the consistency between our benchmark evaluations and human preferences. Using I2I-Bench, we benchmark numerous mainstream image editing models, investigating the gaps and trade-offs between editing models across various dimensions. We will open-source all components of I2I-Bench to facilitate future research.