Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis
This addresses the challenge of precise layout control for users in text-to-image generation, offering a novel training-free approach.
The paper tackles the problem of layout control in text-to-image synthesis by introducing Zero-Painter, a training-free framework that uses object masks and descriptions to generate images with high fidelity, achieving superior performance in preserving textual details and adhering to mask shapes compared to state-of-the-art methods.
We present Zero-Painter, a novel training-free framework for layout-conditional text-to-image synthesis that facilitates the creation of detailed and controlled imagery from textual prompts. Our method utilizes object masks and individual descriptions, coupled with a global text prompt, to generate images with high fidelity. Zero-Painter employs a two-stage process involving our novel Prompt-Adjusted Cross-Attention (PACA) and Region-Grouped Cross-Attention (ReGCA) blocks, ensuring precise alignment of generated objects with textual prompts and mask shapes. Our extensive experiments demonstrate that Zero-Painter surpasses current state-of-the-art methods in preserving textual details and adhering to mask shapes.