CVLGApr 24, 2025

Fast Autoregressive Models for Continuous Latent Generation

arXiv:2504.18391v15 citationsh-index: 42
Originality Incremental advance
AI Analysis

This work addresses the scalability gap in visual autoregressive modeling for high-fidelity image generation, representing an incremental improvement over existing methods.

The paper tackled the slow inference of masked autoregressive models for continuous image generation by proposing the Fast AutoRegressive model (FAR), which uses a lightweight shortcut head to achieve 2.3x faster inference while maintaining competitive FID and IS scores.

Autoregressive models have demonstrated remarkable success in sequential data generation, particularly in NLP, but their extension to continuous-domain image generation presents significant challenges. Recent work, the masked autoregressive model (MAR), bypasses quantization by modeling per-token distributions in continuous spaces using a diffusion head but suffers from slow inference due to the high computational cost of the iterative denoising process. To address this, we propose the Fast AutoRegressive model (FAR), a novel framework that replaces MAR's diffusion head with a lightweight shortcut head, enabling efficient few-step sampling while preserving autoregressive principles. Additionally, FAR seamlessly integrates with causal Transformers, extending them from discrete to continuous token generation without requiring architectural modifications. Experiments demonstrate that FAR achieves $2.3\times$ faster inference than MAR while maintaining competitive FID and IS scores. This work establishes the first efficient autoregressive paradigm for high-fidelity continuous-space image generation, bridging the critical gap between quality and scalability in visual autoregressive modeling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes