Spatial PixelCNN: Generating Images from Patches
This work addresses image generation and upscaling for computer vision applications, but it is incremental as it builds on existing autoregressive models with a patch-based approach.
The paper tackles the problem of generating images from small patches by proposing Spatial PixelCNN, a conditional autoregressive model that uses pixel coordinates and VAE features to train on patches and reproduce full-sized images, achieving similar performance to PixelCNN++ on MNIST and enabling upscaling up to 50×.
In this paper we propose Spatial PixelCNN, a conditional autoregressive model that generates images from small patches. By conditioning on a grid of pixel coordinates and global features extracted from a Variational Autoencoder (VAE), we are able to train on patches of images, and reproduce the full-sized image. We show that it not only allows for generating high quality samples at the same resolution as the underlying dataset, but is also capable of upscaling images to arbitrary resolutions (tested at resolutions up to $50\times$) on the MNIST dataset. Compared to a PixelCNN++ baseline, Spatial PixelCNN quantitatively and qualitatively achieves similar performance on the MNIST dataset.