CVLGNov 7, 2023

Holistic Evaluation of Text-To-Image Models

Stanford
arXiv:2311.04287v1197 citationsh-index: 148Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of evaluating capabilities and risks in text-to-image models for researchers and practitioners, providing a broad benchmark but is incremental as it builds on existing evaluation frameworks.

The authors tackled the lack of comprehensive quantitative evaluation for text-to-image models by introducing the Holistic Evaluation of Text-to-Image Models (HEIM) benchmark, which assesses 12 aspects across 62 scenarios on 26 state-of-the-art models, revealing that no single model excels in all aspects.

The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on text-image alignment and image quality, we identify 12 aspects, including text-image alignment, image quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency. We curate 62 scenarios encompassing these aspects and evaluate 26 state-of-the-art text-to-image models on this benchmark. Our results reveal that no single model excels in all aspects, with different models demonstrating different strengths. We release the generated images and human evaluation results for full transparency at https://crfm.stanford.edu/heim/v1.1.0 and the code at https://github.com/stanford-crfm/helm, which is integrated with the HELM codebase.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes