Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models
This work addresses the challenge of incorporating individual user preferences into image generation for users of T2I models, though it is incremental as it builds on existing diffusion model frameworks.
The paper tackles the problem of inaccurate personalization in text-to-image diffusion models due to limited input token capacity by proposing DrUM, a method that uses condition-level modeling with a transformer-based adapter, achieving strong performance on large-scale datasets and compatibility with open-source models.
Personalized generation in T2I diffusion models aims to naturally incorporate individual user preferences into the generation process with minimal user intervention. However, existing studies primarily rely on prompt-level modeling with large-scale models, often leading to inaccurate personalization due to the limited input token capacity of T2I diffusion models. To address these limitations, we propose DrUM, a novel method that integrates user profiling with a transformer-based adapter to enable personalized generation through condition-level modeling in the latent space. DrUM demonstrates strong performance on large-scale datasets and seamlessly integrates with open-source text encoders, making it compatible with widely used foundation T2I models without requiring additional fine-tuning.