LGCVMLDec 31, 2015

Autoencoding beyond pixels using a learned similarity metric

arXiv:1512.09300v22217 citations
Originality Incremental advance
AI Analysis

This work addresses image generation and representation learning for computer vision, offering an incremental improvement over existing VAE methods.

The paper tackles the problem of improving autoencoder reconstruction by replacing pixel-wise errors with feature-wise errors using a learned similarity metric from a GAN discriminator, resulting in better visual fidelity for face images and enabling modification of high-level features through embedding arithmetic.

We present an autoencoder that leverages learned representations to better measure similarities in data space. By combining a variational autoencoder with a generative adversarial network we can use learned feature representations in the GAN discriminator as basis for the VAE reconstruction objective. Thereby, we replace element-wise errors with feature-wise errors to better capture the data distribution while offering invariance towards e.g. translation. We apply our method to images of faces and show that it outperforms VAEs with element-wise similarity measures in terms of visual fidelity. Moreover, we show that the method learns an embedding in which high-level abstract visual features (e.g. wearing glasses) can be modified using simple arithmetic.

Code Implementations28 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes