CVGRLGAug 17, 2017

PixelNN: Example-based Image Synthesis

arXiv:1708.05349v147 citations
Originality Incremental advance
AI Analysis

This addresses the limitations of deep generative models in conditional image synthesis, such as mode collapse and lack of interpretability, for applications in domains like human faces and objects.

The authors tackled the problem of generating diverse and controllable high-frequency photorealistic images from incomplete signals like low-resolution images or edge maps, by proposing a two-stage pipeline that combines a CNN for initial mapping with a pixel-wise nearest neighbor method to produce multiple high-quality outputs.

We present a simple nearest-neighbor (NN) approach that synthesizes high-frequency photorealistic images from an "incomplete" signal such as a low-resolution image, a surface normal map, or edges. Current state-of-the-art deep generative models designed for such conditional image synthesis lack two important things: (1) they are unable to generate a large set of diverse outputs, due to the mode collapse problem. (2) they are not interpretable, making it difficult to control the synthesized output. We demonstrate that NN approaches potentially address such limitations, but suffer in accuracy on small datasets. We design a simple pipeline that combines the best of both worlds: the first stage uses a convolutional neural network (CNN) to maps the input to a (overly-smoothed) image, and the second stage uses a pixel-wise nearest neighbor method to map the smoothed output to multiple high-quality, high-frequency outputs in a controllable manner. We demonstrate our approach for various input modalities, and for various domains ranging from human faces to cats-and-dogs to shoes and handbags.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes