CVAIGRLGJul 28, 2017

Photographic Image Synthesis with Cascaded Refinement Networks

arXiv:1707.09405v11003 citations
Originality Incremental advance
AI Analysis

It provides a method for rendering high-resolution photographic images from semantic inputs, which is incremental as it builds on existing layout-to-image synthesis but uses a non-adversarial approach.

The paper tackles the problem of generating photographic images from semantic layouts without adversarial training, achieving more realistic results than alternatives and scaling to 2-megapixel resolution.

We present an approach to synthesizing photographic images conditioned on semantic layouts. Given a semantic label map, our approach produces an image with photographic appearance that conforms to the input layout. The approach thus functions as a rendering engine that takes a two-dimensional semantic specification of the scene and produces a corresponding photographic image. Unlike recent and contemporaneous work, our approach does not rely on adversarial training. We show that photographic images can be synthesized from semantic layouts by a single feedforward network with appropriate structure, trained end-to-end with a direct regression objective. The presented approach scales seamlessly to high resolutions; we demonstrate this by synthesizing photographic images at 2-megapixel resolution, the full resolution of our training data. Extensive perceptual experiments on datasets of outdoor and indoor scenes demonstrate that images synthesized by the presented approach are considerably more realistic than alternative approaches. The results are shown in the supplementary video at https://youtu.be/0fhUJT21-bs

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes