CVJun 6, 2024

Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis

arXiv:2406.04032v113 citations
AI Analysis

This addresses the challenge of precise layout control for users in text-to-image generation, offering a novel training-free approach.

The paper tackles the problem of layout control in text-to-image synthesis by introducing Zero-Painter, a training-free framework that uses object masks and descriptions to generate images with high fidelity, achieving superior performance in preserving textual details and adhering to mask shapes compared to state-of-the-art methods.

We present Zero-Painter, a novel training-free framework for layout-conditional text-to-image synthesis that facilitates the creation of detailed and controlled imagery from textual prompts. Our method utilizes object masks and individual descriptions, coupled with a global text prompt, to generate images with high fidelity. Zero-Painter employs a two-stage process involving our novel Prompt-Adjusted Cross-Attention (PACA) and Region-Grouped Cross-Attention (ReGCA) blocks, ensuring precise alignment of generated objects with textual prompts and mask shapes. Our extensive experiments demonstrate that Zero-Painter surpasses current state-of-the-art methods in preserving textual details and adhering to mask shapes.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes