CV LG MLOct 10, 2018

Unpaired High-Resolution and Scalable Style Transfer Using Generative Adversarial Networks

Andrej Junginger, Markus Hanselmann, Thilo Strauss, Sebastian Boblest, Jens Buchner, Holger Ulmer

arXiv:1810.05724v15.210 citations

Originality Incremental advance

AI Analysis

This addresses a scalability bottleneck for researchers and practitioners in image processing, though it is an incremental improvement on existing GAN-based style transfer methods.

The paper tackles the problem of high memory consumption in unpaired image style transfer with GANs, which limits processable image sizes, by proposing a method that processes overlapping image subsamples instead of whole images. This approach enables translation of images up to 50 megapixels, reduces required training images, and preserves local details and global consistency.

Neural networks have proven their capabilities by outperforming many other approaches on regression or classification tasks on various kinds of data. Other astonishing results have been achieved using neural nets as data generators, especially in settings of generative adversarial networks (GANs). One special application is the field of image domain translations. Here, the goal is to take an image with a certain style (e.g. a photography) and transform it into another one (e.g. a painting). If such a task is performed for unpaired training examples, the corresponding GAN setting is complex, the neural networks are large, and this leads to a high peak memory consumption during, both, training and evaluation phase. This sets a limit to the highest processable image size. We address this issue by the idea of not processing the whole image at once, but to train and evaluate the domain translation on the level of overlapping image subsamples. This new approach not only enables us to translate high-resolution images that otherwise cannot be processed by the neural network at once, but also allows us to work with comparably small neural networks and with limited hardware resources. Additionally, the number of images required for the training process is significantly reduced. We present high-quality results on images with a total resolution of up to over 50 megapixels and emonstrate that our method helps to preserve local image details while it also keeps global consistency.

View on arXiv PDF

Similar