LG AI ST MLJun 20, 2022

Identifiability of deep generative models without auxiliary information

Bohdan Kivva, Goutham Rajendran, Pradeep Ravikumar, Bryon Aragam

arXiv:2206.10044v227.377 citationsh-index: 60

Originality Highly original

AI Analysis

This addresses a foundational issue in unsupervised learning for researchers and practitioners by enabling more reliable model interpretation without extra data.

The paper tackles the problem of proving identifiability for deep generative models without needing auxiliary information, showing that a broad class of models, including common VAE architectures, can be identified up to transformations like affine mappings, partially resolving an open problem.

We prove identifiability of a broad class of deep latent variable models that (a) have universal approximation capabilities and (b) are the decoders of variational autoencoders that are commonly used in practice. Unlike existing work, our analysis does not require weak supervision, auxiliary information, or conditioning in the latent space. Specifically, we show that for a broad class of generative (i.e. unsupervised) models with universal approximation capabilities, the side information $u$ is not necessary: We prove identifiability of the entire generative model where we do not observe $u$ and only observe the data $x$. The models we consider match autoencoder architectures used in practice that leverage mixture priors in the latent space and ReLU/leaky-ReLU activations in the encoder, such as VaDE and MFC-VAE. Our main result is an identifiability hierarchy that significantly generalizes previous work and exposes how different assumptions lead to different "strengths" of identifiability, and includes certain "vanilla" VAEs with isotropic Gaussian priors as a special case. For example, our weakest result establishes (unsupervised) identifiability up to an affine transformation, and thus partially resolves an open problem regarding model identifiability raised in prior work. These theoretical results are augmented with experiments on both simulated and real data.

View on arXiv PDF

Similar