CLAICVCYIVMar 17, 2024

Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts

arXiv:2403.11092v133 citationsh-index: 30NAACL
Originality Synthesis-oriented
AI Analysis

This work addresses reliability issues in multilingual benchmarks for text-to-image models, offering tools for improved metric development, but it is incremental as it focuses on error correction rather than new model capabilities.

The study identified translation errors in the CoCo-CroLa benchmark for text-to-image models, corrected them, and showed that these errors impact assessment results, with corrections affecting image outputs predictably via text similarity scores.

Benchmarks of the multilingual capabilities of text-to-image (T2I) models compare generated images prompted in a test language to an expected image distribution over a concept set. One such benchmark, "Conceptual Coverage Across Languages" (CoCo-CroLa), assesses the tangible noun inventory of T2I models by prompting them to generate pictures from a concept list translated to seven languages and comparing the output image populations. Unfortunately, we find that this benchmark contains translation errors of varying severity in Spanish, Japanese, and Chinese. We provide corrections for these errors and analyze how impactful they are on the utility and validity of CoCo-CroLa as a benchmark. We reassess multiple baseline T2I models with the revisions, compare the outputs elicited under the new translations to those conditioned on the old, and show that a correction's impactfulness on the image-domain benchmark results can be predicted in the text domain with similarity scores. Our findings will guide the future development of T2I multilinguality metrics by providing analytical tools for practical translation decisions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes