CVMar 16, 2025

Personalize Anything for Free with Diffusion Transformer

arXiv:2503.12590v121 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient and flexible personalized image generation for users, though it appears incremental as it builds on existing DiT capabilities.

The paper tackles the problem of personalized image generation with diffusion transformers (DiTs) by proposing a training-free framework that achieves state-of-the-art performance in identity preservation and versatility, enabling layout-guided generation, multi-subject personalization, and mask-controlled editing.

Personalized image generation aims to produce images of user-specified concepts while enabling flexible editing. Recent training-free approaches, while exhibit higher computational efficiency than training-based methods, struggle with identity preservation, applicability, and compatibility with diffusion transformers (DiTs). In this paper, we uncover the untapped potential of DiT, where simply replacing denoising tokens with those of a reference subject achieves zero-shot subject reconstruction. This simple yet effective feature injection technique unlocks diverse scenarios, from personalization to image editing. Building upon this observation, we propose \textbf{Personalize Anything}, a training-free framework that achieves personalized image generation in DiT through: 1) timestep-adaptive token replacement that enforces subject consistency via early-stage injection and enhances flexibility through late-stage regularization, and 2) patch perturbation strategies to boost structural diversity. Our method seamlessly supports layout-guided generation, multi-subject personalization, and mask-controlled editing. Evaluations demonstrate state-of-the-art performance in identity preservation and versatility. Our work establishes new insights into DiTs while delivering a practical paradigm for efficient personalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes