Penalizing Boundary Activation for Object Completeness in Diffusion Models
This addresses a common limitation in diffusion models for text-to-image generation, improving object completeness for downstream applications, but it is incremental as it modifies existing models without retraining.
The study tackled the problem of incomplete object generation in diffusion models for text-to-image synthesis, revealing that RandomCrop augmentation during training disrupts object continuity, and proposed a training-free method that penalizes boundary activation to improve object integrity, resulting in substantial gains in image quality.
Diffusion models have emerged as a powerful technique for text-to-image (T2I) generation, creating high-quality, diverse images across various domains. However, a common limitation in these models is the incomplete display of objects, where fragments or missing parts undermine the model's performance in downstream applications. In this study, we conduct an in-depth analysis of the incompleteness issue and reveal that the primary factor behind incomplete object generation is the usage of RandomCrop during model training. This widely used data augmentation method, though enhances model generalization ability, disrupts object continuity during training. To address this, we propose a training-free solution that penalizes activation values at image boundaries during the early denoising steps. Our method is easily applicable to pre-trained Stable Diffusion models with minimal modifications and negligible computational overhead. Extensive experiments demonstrate the effectiveness of our method, showing substantial improvements in object integrity and image quality.