CVSep 23, 2024

Fine Tuning Text-to-Image Diffusion Models for Correcting Anomalous Images

arXiv:2409.16174v13.72 citationsh-index: 2Has Code

Originality Synthesis-oriented

AI Analysis

This incremental improvement addresses reliability issues for users in fields like art, design, and advertising who rely on text-to-image models.

The study tackled the problem of text-to-image diffusion models producing aberrant images for certain prompts by fine-tuning Stable Diffusion 3 with DreamBooth, resulting in improved performance in visual evaluation and metrics like SSIM, PSNR, and FID, with user surveys showing higher preference.

Since the advent of GANs and VAEs, image generation models have continuously evolved, opening up various real-world applications with the introduction of Stable Diffusion and DALL-E models. These text-to-image models can generate high-quality images for fields such as art, design, and advertising. However, they often produce aberrant images for certain prompts. This study proposes a method to mitigate such issues by fine-tuning the Stable Diffusion 3 model using the DreamBooth technique. Experimental results targeting the prompt "lying on the grass/street" demonstrate that the fine-tuned model shows improved performance in visual evaluation and metrics such as Structural Similarity Index (SSIM), Peak Signal-to-Noise Ratio (PSNR), and Frechet Inception Distance (FID). User surveys also indicated a higher preference for the fine-tuned model. This research is expected to make contributions to enhancing the practicality and reliability of text-to-image models.

View on arXiv PDF Code

Similar