CVApr 17, 2025

Personalized Text-to-Image Generation with Auto-Regressive Models

arXiv:2504.13162v17 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the problem of generating images with specific subjects for AI and creative applications, but it is incremental as it adapts existing auto-regressive models to a new task.

The paper tackled personalized text-to-image generation by optimizing auto-regressive models, achieving comparable subject fidelity and prompt following to leading diffusion-based methods.

Personalized image synthesis has emerged as a pivotal application in text-to-image generation, enabling the creation of images featuring specific subjects in diverse contexts. While diffusion models have dominated this domain, auto-regressive models, with their unified architecture for text and image modeling, remain underexplored for personalized image generation. This paper investigates the potential of optimizing auto-regressive models for personalized image synthesis, leveraging their inherent multimodal capabilities to perform this task. We propose a two-stage training strategy that combines optimization of text embeddings and fine-tuning of transformer layers. Our experiments on the auto-regressive model demonstrate that this method achieves comparable subject fidelity and prompt following to the leading diffusion-based personalization methods. The results highlight the effectiveness of auto-regressive models in personalized image generation, offering a new direction for future research in this area.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes