CVApr 3, 2025

Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation

arXiv:2504.02612v215 citationsh-index: 9
Originality Incremental advance
AI Analysis

This work addresses the problem of slow inference in subject-driven generation for practical applications, though it is incremental as it builds on existing visual autoregressive models.

The paper tackles the computational inefficiency of diffusion models for subject-driven image generation by proposing a fine-tuning approach for visual autoregressive models, achieving significant performance improvements over diffusion-based baselines.

Recent advances in text-to-image generative models have enabled numerous practical applications, including subject-driven generation, which fine-tunes pretrained models to capture subject semantics from only a few examples. While diffusion-based models produce high-quality images, their extensive denoising steps result in significant computational overhead, limiting real-world applicability. Visual autoregressive (VAR) models, which predict next-scale tokens rather than spatially adjacent ones, offer significantly faster inference suitable for practical deployment. In this paper, we propose the first VAR-based approach for subject-driven generation. However, naive fine-tuning VAR leads to computational overhead, language drift, and reduced diversity. To address these challenges, we introduce selective layer tuning to reduce complexity and prior distillation to mitigate language drift. Additionally, we found that the early stages have a greater influence on the generation of subject than the latter stages, which merely synthesize minor details. Based on this finding, we propose scale-wise weighted tuning, which prioritizes coarser resolutions for promoting the model to focus on the subject-relevant information instead of local details. Extensive experiments validate that our method significantly outperforms diffusion-based baselines across various metrics and demonstrates its practical usage.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes