CVLGApr 4, 2018

Image Generation from Scene Graphs

arXiv:1804.01622v1923 citations
Originality Incremental advance
AI Analysis

This addresses the limitation of existing image generation methods that struggle with complex sentences, offering a solution for generating realistic images from detailed scene descriptions, though it is incremental as it builds on prior work in graph-based and adversarial methods.

The paper tackles the problem of generating images from complex descriptions with many objects and relationships, proposing a method that uses scene graphs to explicitly reason about objects and their relationships, resulting in the ability to generate complex images with multiple objects as validated on Visual Genome and COCO-Stuff datasets.

To truly understand the visual world our models should be able not only to recognize images but also generate them. To this end, there has been exciting recent progress on generating images from natural language descriptions. These methods give stunning results on limited domains such as descriptions of birds or flowers, but struggle to faithfully reproduce complex sentences with many objects and relationships. To overcome this limitation we propose a method for generating images from scene graphs, enabling explicitly reasoning about objects and their relationships. Our model uses graph convolution to process input graphs, computes a scene layout by predicting bounding boxes and segmentation masks for objects, and converts the layout to an image with a cascaded refinement network. The network is trained adversarially against a pair of discriminators to ensure realistic outputs. We validate our approach on Visual Genome and COCO-Stuff, where qualitative results, ablations, and user studies demonstrate our method's ability to generate complex images with multiple objects.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes