LGMLMay 24, 2017

Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations

arXiv:1705.08841v1334 citations
Originality Incremental advance
AI Analysis

This addresses the need for minimal supervision in representation learning for grouped data, which is incremental as it builds on existing deep probabilistic models.

The paper tackles the problem of learning disentangled representations from grouped observations, such as face images grouped by identity, and presents the Multi-Level Variational Autoencoder (ML-VAE) which achieves semantically meaningful disentanglement and generalization to unseen groups.

We would like to learn a representation of the data which decomposes an observation into factors of variation which we can independently control. Specifically, we want to use minimal supervision to learn a latent representation that reflects the semantics behind a specific grouping of the data, where within a group the samples share a common factor of variation. For example, consider a collection of face images grouped by identity. We wish to anchor the semantics of the grouping into a relevant and disentangled representation that we can easily exploit. However, existing deep probabilistic models often assume that the observations are independent and identically distributed. We present the Multi-Level Variational Autoencoder (ML-VAE), a new deep probabilistic model for learning a disentangled representation of a set of grouped observations. The ML-VAE separates the latent representation into semantically meaningful parts by working both at the group level and the observation level, while retaining efficient test-time inference. Quantitative and qualitative evaluations show that the ML-VAE model (i) learns a semantically meaningful disentanglement of grouped data, (ii) enables manipulation of the latent representation, and (iii) generalises to unseen groups.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes