CV AI MMDec 4, 2023

A Contrastive Compositional Benchmark for Text-to-Image Synthesis: A Study with Unified Text-to-Image Fidelity Metrics

Xiangru Zhu, Penglei Sun, Chengyu Wang, Jingping Liu, Zhixu Li, Yanghua Xiao, Jun Huang

arXiv:2312.02338v211.010 citationsh-index: 20Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for better evaluation of compositionality in text-to-image models, which is crucial for researchers and developers, though it is incremental as it builds on existing benchmarking efforts.

The authors tackled the problem of evaluating compositionality in text-to-image synthesis by introducing Winoground-T2I, a benchmark with 11K contrastive sentence pairs, and proposed a strategy to assess metric reliability, revealing insights into model and metric performance.

Text-to-image (T2I) synthesis has recently achieved significant advancements. However, challenges remain in the model's compositionality, which is the ability to create new combinations from known components. We introduce Winoground-T2I, a benchmark designed to evaluate the compositionality of T2I models. This benchmark includes 11K complex, high-quality contrastive sentence pairs spanning 20 categories. These contrastive sentence pairs with subtle differences enable fine-grained evaluations of T2I synthesis models. Additionally, to address the inconsistency across different metrics, we propose a strategy that evaluates the reliability of various metrics by using comparative sentence pairs. We use Winoground-T2I with a dual objective: to evaluate the performance of T2I models and the metrics used for their evaluation. Finally, we provide insights into the strengths and weaknesses of these metrics and the capabilities of current T2I models in tackling challenges across a range of complex compositional categories. Our benchmark is publicly available at https://github.com/zhuxiangru/Winoground-T2I .

View on arXiv PDF Code

Similar