Evaluating Robustness in Latent Diffusion Models via Embedding Level Augmentation
This work addresses robustness issues in latent diffusion models for image generation, which is an incremental improvement focusing on a specific domain.
The paper tackles the lack of robustness in latent diffusion models by proposing novel data augmentation techniques and a tailored evaluation pipeline, fine-tuning Stable Diffusion models with Dreambooth to reveal and address robustness shortcomings in processing textual prompts.
Latent diffusion models (LDMs) achieve state-of-the-art performance across various tasks, including image generation and video synthesis. However, they generally lack robustness, a limitation that remains not fully explored in current research. In this paper, we propose several methods to address this gap. First, we hypothesize that the robustness of LDMs primarily should be measured without their text encoder, because if we take and explore the whole architecture, the problems of image generator and text encoders wll be fused. Second, we introduce novel data augmentation techniques designed to reveal robustness shortcomings in LDMs when processing diverse textual prompts. We then fine-tune Stable Diffusion 3 and Stable Diffusion XL models using Dreambooth, incorporating these proposed augmentation methods across multiple tasks. Finally, we propose a novel evaluation pipeline specifically tailored to assess the robustness of LDMs fine-tuned via Dreambooth.