CVJan 4, 2023

Attribute-Centric Compositional Text-to-Image Generation

arXiv:2301.01413v120 citationsh-index: 52
Originality Incremental advance
AI Analysis

This addresses fairness and robustness concerns in text-to-image generation for applications requiring diverse attribute combinations.

The paper tackles the problem of text-to-image models struggling with underrepresented attribute compositions while overfitting to common ones, proposing ACTIG framework that improves generation of underrepresented attributes and reduces overfitting, achieving better image quality and text-image consistency than previous methods.

Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model's ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes