CLSep 25, 2025

Un-Doubling Diffusion: LLM-guided Disambiguation of Homonym Duplication

arXiv:2509.21262v2h-index: 3
Originality Incremental advance
AI Analysis

This addresses a specific challenge in text-to-image generation for users dealing with ambiguous prompts, but it is incremental as it builds on existing methods for bias reduction.

The paper tackles the problem of homonym duplication in diffusion models, where words with multiple meanings cause the generation of multiple senses simultaneously, and evaluates methods to measure and reduce duplication rates, showing that prompt expansion effectively mitigates this issue.

Homonyms are words with identical spelling but distinct meanings, which pose challenges for many generative models. When a homonym appears in a prompt, diffusion models may generate multiple senses of the word simultaneously, which is known as homonym duplication. This issue is further complicated by an Anglocentric bias, which includes an additional translation step before the text-to-image model pipeline. As a result, even words that are not homonymous in the original language may become homonyms and lose their meaning after translation into English. In this paper, we introduce a method for measuring duplication rates and conduct evaluations of different diffusion models using both automatic evaluation utilizing Vision-Language Models (VLM) and human evaluation. Additionally, we investigate methods to mitigate the homonym duplication problem through prompt expansion, demonstrating that this approach also effectively reduces duplication related to Anglocentric bias. The code for the automatic evaluation pipeline is publicly available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes