CVAILGAug 7, 2025

UNCAGE: Contrastive Attention Guidance for Masked Generative Transformers in Text-to-Image Generation

arXiv:2508.05399v1h-index: 5Has CodeIEEE Access
Originality Incremental advance
AI Analysis

This addresses a key problem in text-to-image generation for users needing accurate and efficient image synthesis, though it is incremental as it builds on existing Masked Generative Transformers.

The paper tackles the challenge of compositional text-to-image generation with Masked Generative Transformers, which often fail to accurately bind attributes and achieve text-image alignment, by proposing UNCAGE, a training-free method that improves compositional fidelity with negligible inference overhead, consistently enhancing performance across benchmarks and metrics.

Text-to-image (T2I) generation has been actively studied using Diffusion Models and Autoregressive Models. Recently, Masked Generative Transformers have gained attention as an alternative to Autoregressive Models to overcome the inherent limitations of causal attention and autoregressive decoding through bidirectional attention and parallel decoding, enabling efficient and high-quality image generation. However, compositional T2I generation remains challenging, as even state-of-the-art Diffusion Models often fail to accurately bind attributes and achieve proper text-image alignment. While Diffusion Models have been extensively studied for this issue, Masked Generative Transformers exhibit similar limitations but have not been explored in this context. To address this, we propose Unmasking with Contrastive Attention Guidance (UNCAGE), a novel training-free method that improves compositional fidelity by leveraging attention maps to prioritize the unmasking of tokens that clearly represent individual objects. UNCAGE consistently improves performance in both quantitative and qualitative evaluations across multiple benchmarks and metrics, with negligible inference overhead. Our code is available at https://github.com/furiosa-ai/uncage.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes