CVApr 19

Depth Adaptive Efficient Visual Autoregressive Modeling

arXiv:2604.1728636.8h-index: 3Has Code

AI Analysis

For researchers and practitioners using VAR models for high-resolution image generation, DepthVAR offers a training-free method to reduce inference cost without sacrificing quality, outperforming existing hard-pruning approaches.

DepthVAR accelerates Visual Autoregressive (VAR) modeling by adaptively allocating per-token computational depth instead of pruning entire tokens, achieving 2.3×–3.1× speedup with minimal quality loss.

Visual Autoregressive (VAR) modeling inefficiently applies a fixed computational depth to each position when generating high-resolution images. While existing methods accelerate inference by pruning tokens using frequency maps, their binary hard-pruning approach is fundamentally limited and fails to improve quality even with better frequency estimation. Observing that VAR models possess significant depth redundancy, we propose a paradigm shift from pruning entire tokens to adaptively allocating per-token computational depth. To this end, we introduce DepthVAR, a training-free framework that dynamically allocates computation. It integrates an adaptive depth scheduler, which assigns computational depth via a cyclic rotated schedule for balanced, non-static refinement, with a dynamic inference process that translates these depths into layer-major masks, selectively applies transformer blocks, and blends the resulting codes to ensure each token's influence is proportional to its processing depth. Extensive experiments show that DepthVAR achieves 2.3$\times$-3.1$\times$ acceleration with minimal quality loss, offering a competitive compute-performance trade-off compared to existing hard-pruning approaches. Code is available at https://github.com/STOVAGtz/DepthVAR

View on arXiv PDF Code

Similar