CVMay 15, 2025

IMAGE-ALCHEMY: Advancing subject fidelity in personalised text-to-image generation

Amritanshu Tiwari, Cherish Puniani, Kaustubh Sharma, Ojasva Nema

arXiv:2505.10743v13.6h-index: 1

Originality Incremental advance

AI Analysis

This addresses the problem of subject fidelity in personalized image generation for users of AI art tools, representing an incremental improvement over prior approaches.

The paper tackles the challenge of personalizing text-to-image diffusion models to represent novel subjects from a few reference images without catastrophic forgetting or overfitting, achieving a DINO similarity score of 0.789 on SDXL, which outperforms existing methods.

Recent advances in text-to-image diffusion models, particularly Stable Diffusion, have enabled the generation of highly detailed and semantically rich images. However, personalizing these models to represent novel subjects based on a few reference images remains challenging. This often leads to catastrophic forgetting, overfitting, or large computational overhead.We propose a two-stage pipeline that addresses these limitations by leveraging LoRA-based fine-tuning on the attention weights within the U-Net of the Stable Diffusion XL (SDXL) model. First, we use the unmodified SDXL to generate a generic scene by replacing the subject with its class label. Then, we selectively insert the personalized subject through a segmentation-driven image-to-image (Img2Img) pipeline that uses the trained LoRA weights.This framework isolates the subject encoding from the overall composition, thus preserving SDXL's broader generative capabilities while integrating the new subject in a high-fidelity manner. Our method achieves a DINO similarity score of 0.789 on SDXL, outperforming existing personalized text-to-image approaches.

View on arXiv PDF

Similar