CVJul 24, 2020

The Surprising Effectiveness of Linear Unsupervised Image-to-Image Translation

arXiv:2007.12568v17.212 citationsHas Code

Originality Highly original

AI Analysis

This addresses a fundamental limitation in image translation for computer vision, offering a simpler and more efficient alternative to complex deep learning models.

The paper tackles the ill-posed problem of unsupervised image-to-image translation by showing that deep methods rely on a locality bias and fail on nonlocal transformations, while introducing linear encoder-decoder architectures that achieve comparable or better results with much faster training.

Unsupervised image-to-image translation is an inherently ill-posed problem. Recent methods based on deep encoder-decoder architectures have shown impressive results, but we show that they only succeed due to a strong locality bias, and they fail to learn very simple nonlocal transformations (e.g. mapping upside down faces to upright faces). When the locality bias is removed, the methods are too powerful and may fail to learn simple local transformations. In this paper we introduce linear encoder-decoder architectures for unsupervised image to image translation. We show that learning is much easier and faster with these architectures and yet the results are surprisingly effective. In particular, we show a number of local problems for which the results of the linear methods are comparable to those of state-of-the-art architectures but with a fraction of the training time, and a number of nonlocal problems for which the state-of-the-art fails while linear methods succeed.

View on arXiv PDF Code

Similar