CVAug 10, 2025

CoAR: Concept Injection into Autoregressive Models for Personalized Text-to-Image Generation

arXiv:2508.07341v11 citationsh-index: 8Has Code
Originality Incremental advance
AI Analysis

This addresses the need for efficient and effective customization in AI image generation, though it is incremental as it builds on existing autoregressive models.

The paper tackles the problem of personalized text-to-image generation in autoregressive models by proposing CoAR, a framework that injects subject concepts with minimal parameter tuning, achieving competitive performance while tuning less than 0.05% of parameters.

The unified autoregressive (AR) model excels at multimodal understanding and generation, but its potential for customized image generation remains underexplored. Existing customized generation methods rely on full fine-tuning or adapters, making them costly and prone to overfitting or catastrophic forgetting. In this paper, we propose \textbf{CoAR}, a novel framework for injecting subject concepts into the unified AR models while keeping all pre-trained parameters completely frozen. CoAR learns effective, specific subject representations with only a minimal number of parameters using a Layerwise Multimodal Context Learning strategy. To address overfitting and language drift, we further introduce regularization that preserves the pre-trained distribution and anchors context tokens to improve subject fidelity and re-contextualization. Additionally, CoAR supports training-free subject customization in a user-provided style. Experiments demonstrate that CoAR achieves superior performance on both subject-driven personalization and style personalization, while delivering significant gains in computational and memory efficiency. Notably, CoAR tunes less than \textbf{0.05\%} of the parameters while achieving competitive performance compared to recent Proxy-Tuning. Code: https://github.com/KZF-kzf/CoAR

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes