CVLGMay 19

Boosting Text-to-Image Diffusion Models via Core Token Attention-Based Seed Selection

arXiv:2605.1953218.4
Predicted impact top 52% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For users of text-to-image diffusion models, this work provides a lightweight, plug-and-play method to improve generation quality without retraining, though it is incremental as it builds on known attention mechanisms.

Text-to-image diffusion models are sensitive to random seeds, causing large variations in output quality. The authors propose Attention-Based Seed Selection (ABSS), a training-free method that ranks seeds by analyzing cross-attention to core tokens during early denoising steps, consistently improving text-image alignment and visual quality across three benchmarks for Stable Diffusion variants.

Text-to-image diffusion models can synthesize high-quality images, yet the outcome is notoriously sensitive to the random seed: different initial seeds often yield large variations in image quality and prompt-image alignment. We revisit this "seed effect" and show that attention dynamics over prompt core tokens, the content-bearing words, measured during the first few denoising steps, strongly predict final generation quality. Building on this observation, we introduce Attention-Based Seed Selection (ABSS), a training-free, plug-and-play method that ranks seeds for a given prompt by leveraging cross-attention to core tokens during the denoising process. ABSS requires no finetuning and does not alter the initial noise; it scores and ranks all candidate seeds, keeps only the top-k for full generation, and discards the rest, without relying on a fixed accept/reject threshold. Operating purely at inference time, ABSS can serve as a lightweight pre-selection add-on for existing seed-optimization pipelines, enabling additional gains. Across three benchmarks, extensive experiments show that ABSS enables consistent improvements in text-image alignment and visual quality for Stable Diffusion variants, as corroborated by human preference and alignment metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes