LGJun 20, 2022
Identifiability of deep generative models without auxiliary informationBohdan Kivva, Goutham Rajendran, Pradeep Ravikumar et al.
We prove identifiability of a broad class of deep latent variable models that (a) have universal approximation capabilities and (b) are the decoders of variational autoencoders that are commonly used in practice. Unlike existing work, our analysis does not require weak supervision, auxiliary information, or conditioning in the latent space. Specifically, we show that for a broad class of generative (i.e. unsupervised) models with universal approximation capabilities, the side information $u$ is not necessary: We prove identifiability of the entire generative model where we do not observe $u$ and only observe the data $x$. The models we consider match autoencoder architectures used in practice that leverage mixture priors in the latent space and ReLU/leaky-ReLU activations in the encoder, such as VaDE and MFC-VAE. Our main result is an identifiability hierarchy that significantly generalizes previous work and exposes how different assumptions lead to different "strengths" of identifiability, and includes certain "vanilla" VAEs with isotropic Gaussian priors as a special case. For example, our weakest result establishes (unsupervised) identifiability up to an affine transformation, and thus partially resolves an open problem regarding model identifiability raised in prior work. These theoretical results are augmented with experiments on both simulated and real data.
LGOct 10, 2021
Structure learning in polynomial time: Greedy algorithms, Bregman information, and exponential familiesGoutham Rajendran, Bohdan Kivva, Ming Gao et al.
Greedy algorithms have long been a workhorse for learning graphical models, and more broadly for learning statistical models with sparse structure. In the context of learning directed acyclic graphs, greedy algorithms are popular despite their worst-case exponential runtime. In practice, however, they are very efficient. We provide new insight into this phenomenon by studying a general greedy score-based algorithm for learning DAGs. Unlike edge-greedy algorithms such as the popular GES and hill-climbing algorithms, our approach is vertex-greedy and requires at most a polynomial number of score evaluations. We then show how recent polynomial-time algorithms for learning DAG models are a special case of this algorithm, thereby illustrating how these order-based algorithms can be rigourously interpreted as score-based algorithms. This observation suggests new score functions and optimality conditions based on the duality between Bregman divergences and exponential families, which we explore in detail. Explicit sample and computational complexity bounds are derived. Finally, we provide extensive experiments suggesting that this algorithm indeed optimizes the score in a variety of settings.
LGJun 29, 2021
Learning latent causal graphs via mixture oraclesBohdan Kivva, Goutham Rajendran, Pradeep Ravikumar et al.
We study the problem of reconstructing a causal graphical model from data in the presence of latent variables. The main problem of interest is recovering the causal structure over the latent variables while allowing for general, potentially nonlinear dependence between the variables. In many practical problems, the dependence between raw observations (e.g. pixels in an image) is much less relevant than the dependence between certain high-level, latent features (e.g. concepts or objects), and this is the setting of interest. We provide conditions under which both the latent representations and the underlying latent causal model are identifiable by a reduction to a mixture oracle. These results highlight an intriguing connection between the well-studied problem of learning the order of a mixture model and the problem of learning the bipartite structure between observables and unobservables. The proof is constructive, and leads to several algorithms for explicitly reconstructing the full graphical model. We discuss efficient algorithms and provide experiments illustrating the algorithms in practice.
DSNov 18, 2020
Exact nuclear norm, completion and decomposition for random overcomplete tensors via degree-4 SOSBohdan Kivva, Aaron Potechin
In this paper we show that simple semidefinite programs inspired by degree $4$ SOS can exactly solve the tensor nuclear norm, tensor decomposition, and tensor completion problems on tensors with random asymmetric components. More precisely, for tensor nuclear norm and tensor decomposition, we show that w.h.p. these semidefinite programs can exactly find the nuclear norm and components of an $(n\times n\times n)$-tensor $\mathcal{T}$ with $m\leq n^{3/2}/polylog(n)$ random asymmetric components. Unlike most of the previous algorithms, our algorithm provides a certificate for the decomposition, does not require knowledge about the number of components in the decomposition and does not make any assumptions on the sizes of the coefficients in the decomposition. As a byproduct, we show that w.h.p. the nuclear norm decomposition exactly coincides with the minimum rank decomposition for tensors with $m\leq n^{3/2}/polylog(n)$ random asymmetric components. For tensor completion, we show that w.h.p. the semidefinite program, introduced by Potechin & Steurer (2017) for tensors with orthogonal components, can exactly recover an $(n\times n\times n)$-tensor $\mathcal{T}$ with $m$ random asymmetric components from only $n^{3/2}m polylog(n)$ randomly observed entries. For non-orthogonal tensors, this improves the dependence on $m$ of the number of entries needed for exact recovery over all previously known algorithms and provides the first theoretical guarantees for exact tensor completion in the overcomplete regime.