CVMay 15, 2025

IMAGE-ALCHEMY: Advancing subject fidelity in personalised text-to-image generation

arXiv:2505.10743v1h-index: 1
Originality Incremental advance
AI Analysis

This addresses the problem of subject fidelity in personalized image generation for users of AI art tools, representing an incremental improvement over prior approaches.

The paper tackles the challenge of personalizing text-to-image diffusion models to represent novel subjects from a few reference images without catastrophic forgetting or overfitting, achieving a DINO similarity score of 0.789 on SDXL, which outperforms existing methods.

Recent advances in text-to-image diffusion models, particularly Stable Diffusion, have enabled the generation of highly detailed and semantically rich images. However, personalizing these models to represent novel subjects based on a few reference images remains challenging. This often leads to catastrophic forgetting, overfitting, or large computational overhead.We propose a two-stage pipeline that addresses these limitations by leveraging LoRA-based fine-tuning on the attention weights within the U-Net of the Stable Diffusion XL (SDXL) model. First, we use the unmodified SDXL to generate a generic scene by replacing the subject with its class label. Then, we selectively insert the personalized subject through a segmentation-driven image-to-image (Img2Img) pipeline that uses the trained LoRA weights.This framework isolates the subject encoding from the overall composition, thus preserving SDXL's broader generative capabilities while integrating the new subject in a high-fidelity manner. Our method achieves a DINO similarity score of 0.789 on SDXL, outperforming existing personalized text-to-image approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes