CV AISep 28, 2025

Not All Tokens are Guided Equal: Improving Guidance in Visual Autoregressive Models

Ky Dan Nguyen, Hoang Lam Tran, Anh-Dung Dinh, Daochang Liu, Weidong Cai, Xiuying Wang, Chang Xu

arXiv:2509.23876v23.6h-index: 11

Originality Highly original

AI Analysis

This addresses a critical weakness in visual autoregressive models for image generation, improving guidance fidelity in tasks like class-conditioned and text-to-image generation.

The paper tackles the problem of information inconsistencies in autoregressive image generation models that scatter guidance signals and cause ambiguous features, introducing Information-Grounding Guidance (IGG) to anchor guidance to semantically important regions via attention, resulting in sharper, more coherent, and semantically grounded images that set a new benchmark for AR-based methods.

Autoregressive (AR) models based on next-scale prediction are rapidly emerging as a powerful tool for image generation, but they face a critical weakness: information inconsistencies between patches across timesteps introduced by progressive resolution scaling. These inconsistencies scatter guidance signals, causing them to drift away from conditioning information and leaving behind ambiguous, unfaithful features. We tackle this challenge with Information-Grounding Guidance (IGG), a novel mechanism that anchors guidance to semantically important regions through attention. By adaptively reinforcing informative patches during sampling, IGG ensures that guidance and content remain tightly aligned. Across both class-conditioned and text-to-image generation tasks, IGG delivers sharper, more coherent, and semantically grounded images, setting a new benchmark for AR-based methods.

View on arXiv PDF

Similar