CVApr 5, 2022

DT2I: Dense Text-to-Image Generation from Region Descriptions

arXiv:2204.02035v15 citationsh-index: 59
Originality Incremental advance
AI Analysis

This addresses the need for more intuitive image generation for applications like design or visualization, though it is incremental by combining layout-to-image and text-to-image approaches.

The paper tackles the problem of generating realistic images of complex scenes by introducing dense text-to-image synthesis, where images are generated from region descriptions, and proposes DTC-GAN with a multi-modal region feature matching loss to achieve plausible results.

Despite astonishing progress, generating realistic images of complex scenes remains a challenging problem. Recently, layout-to-image synthesis approaches have attracted much interest by conditioning the generator on a list of bounding boxes and corresponding class labels. However, previous approaches are very restrictive because the set of labels is fixed a priori. Meanwhile, text-to-image synthesis methods have substantially improved and provide a flexible way for conditional image generation. In this work, we introduce dense text-to-image (DT2I) synthesis as a new task to pave the way toward more intuitive image generation. Furthermore, we propose DTC-GAN, a novel method to generate images from semantically rich region descriptions, and a multi-modal region feature matching loss to encourage semantic image-text matching. Our results demonstrate the capability of our approach to generate plausible images of complex scenes using region captions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes