LG AIOct 28, 2025

Training-Free Safe Text Embedding Guidance for Text-to-Image Diffusion Models

Byeonghu Na, Mina Kang, Jiseok Kwak, Minsang Park, Jiwoo Shin, SeJoon Jun, Gayoung Lee, Jin-Hwa Kim, Il-Chul Moon

arXiv:2510.24012v12 citationsh-index: 9Has Code

Originality Incremental advance

AI Analysis

This addresses safety concerns in text-to-image generation for users and developers, offering an incremental improvement over prior methods.

The paper tackles the problem of generating harmful or biased images from text-to-image diffusion models by proposing a training-free method that adjusts text embeddings during sampling to improve safety, achieving better performance in removing unsafe content while preserving semantic intent compared to existing baselines.

Text-to-image models have recently made significant advances in generating realistic and semantically coherent images, driven by advanced diffusion models and large-scale web-crawled datasets. However, these datasets often contain inappropriate or biased content, raising concerns about the generation of harmful outputs when provided with malicious text prompts. We propose Safe Text embedding Guidance (STG), a training-free approach to improve the safety of diffusion models by guiding the text embeddings during sampling. STG adjusts the text embeddings based on a safety function evaluated on the expected final denoised image, allowing the model to generate safer outputs without additional training. Theoretically, we show that STG aligns the underlying model distribution with safety constraints, thereby achieving safer outputs while minimally affecting generation quality. Experiments on various safety scenarios, including nudity, violence, and artist-style removal, show that STG consistently outperforms both training-based and training-free baselines in removing unsafe content while preserving the core semantic intent of input prompts. Our code is available at https://github.com/aailab-kaist/STG.

View on arXiv PDF Code

Similar