CVAISep 28, 2025

Not All Tokens are Guided Equal: Improving Guidance in Visual Autoregressive Models

arXiv:2509.23876v2h-index: 11
Originality Highly original
AI Analysis

This addresses a critical weakness in visual autoregressive models for image generation, improving guidance fidelity in tasks like class-conditioned and text-to-image generation.

The paper tackles the problem of information inconsistencies in autoregressive image generation models that scatter guidance signals and cause ambiguous features, introducing Information-Grounding Guidance (IGG) to anchor guidance to semantically important regions via attention, resulting in sharper, more coherent, and semantically grounded images that set a new benchmark for AR-based methods.

Autoregressive (AR) models based on next-scale prediction are rapidly emerging as a powerful tool for image generation, but they face a critical weakness: information inconsistencies between patches across timesteps introduced by progressive resolution scaling. These inconsistencies scatter guidance signals, causing them to drift away from conditioning information and leaving behind ambiguous, unfaithful features. We tackle this challenge with Information-Grounding Guidance (IGG), a novel mechanism that anchors guidance to semantically important regions through attention. By adaptively reinforcing informative patches during sampling, IGG ensures that guidance and content remain tightly aligned. Across both class-conditioned and text-to-image generation tasks, IGG delivers sharper, more coherent, and semantically grounded images, setting a new benchmark for AR-based methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes