LGSep 25, 2023
Identification of Mixtures of Discrete Product Distributions in Near-Optimal Sample and Time ComplexitySpencer L. Gordon, Erik Jahn, Bijan Mazaheri et al.
We consider the problem of identifying, from statistics, a distribution of discrete random variables $X_1,\ldots,X_n$ that is a mixture of $k$ product distributions. The best previous sample complexity for $n \in O(k)$ was $(1/ζ)^{O(k^2 \log k)}$ (under a mild separation assumption parameterized by $ζ$). The best known lower bound was $\exp(Ω(k))$. It is known that $n\geq 2k-1$ is necessary and sufficient for identification. We show, for any $n\geq 2k-1$, how to achieve sample complexity and run-time complexity $(1/ζ)^{O(k)}$. We also extend the known lower bound of $e^{Ω(k)}$ to match our upper bound across a broad range of $ζ$. Our results are obtained by combining (a) a classic method for robust tensor decomposition, (b) a novel way of bounding the condition number of key matrices called Hadamard extensions, by studying their action only on flattened rank-1 tensors.
LGOct 13, 2023
Identifiability of Product of Experts ModelsSpencer L. Gordon, Manav Kant, Eric Ma et al.
Product of experts (PoE) are layered networks in which the value at each node is an AND (or product) of the values (possibly negated) at its inputs. These were introduced as a neural network architecture that can efficiently learn to generate high-dimensional data which satisfy many low-dimensional constraints -- thereby allowing each individual expert to perform a simple task. PoEs have found a variety of applications in learning. We study the problem of identifiability of a product of experts model having a layer of binary latent variables, and a layer of binary observables that are iid conditional on the latents. The previous best upper bound on the number of observables needed to identify the model was exponential in the number of parameters. We show: (a) When the latents are uniformly distributed, the model is identifiable with a number of observables equal to the number of parameters (and hence best possible). (b) In the more general case of arbitrarily distributed latents, the model is identifiable for a number of observables that is still linear in the number of parameters (and within a factor of two of best-possible). The proofs rely on root interlacing phenomena for some special three-term recurrences.
LGDec 22, 2021
Causal Inference Despite Limited Global Confounding via Mixture ModelsSpencer L. Gordon, Bijan Mazaheri, Yuval Rabani et al.
A Bayesian Network is a directed acyclic graph (DAG) on a set of $n$ random variables (the vertices); a Bayesian Network Distribution (BND) is a probability distribution on the random variables that is Markovian on the graph. A finite $k$-mixture of such models is graphically represented by a larger graph which has an additional ``hidden'' (or ``latent'') random variable $U$, ranging in $\{1,\ldots,k\}$, and a directed edge from $U$ to every other vertex. Models of this type are fundamental to causal inference, where $U$ models an unobserved confounding effect of multiple populations, obscuring the causal relationships in the observable DAG. By solving the mixture problem and recovering the joint probability distribution with $U$, traditionally unidentifiable causal relationships become identifiable. Using a reduction to the more well-studied ``product'' case on empty graphs, we give the first algorithm to learn mixtures of non-empty DAGs.
LGJan 27, 2021
Hadamard Extensions and the Identification of Mixtures of Product DistributionsSpencer L. Gordon, Leonard J. Schulman
The Hadamard Extension of a matrix is the matrix consisting of all Hadamard products of subsets of its rows. This construction arises in the context of identifying a mixture of product distributions on binary random variables: full column rank of such extensions is a necessary ingredient of identification algorithms. We provide several results concerning when a Hadamard Extension has full column rank.
LGDec 29, 2020
Source Identification for Mixtures of Product DistributionsSpencer L. Gordon, Bijan Mazaheri, Yuval Rabani et al.
We give an algorithm for source identification of a mixture of $k$ product distributions on $n$ bits. This is a fundamental problem in machine learning with many applications. Our algorithm identifies the source parameters of an identifiable mixture, given, as input, approximate values of multilinear moments (derived, for instance, from a sufficiently large sample), using $2^{O(k^2)} n^{O(k)}$ arithmetic operations. Our result is the first explicit bound on the computational complexity of source identification of such mixtures. The running time improves previous results by Feldman, O'Donnell, and Servedio (FOCS 2005) and Chen and Moitra (STOC 2019) that guaranteed only learning the mixture (without parametric identification of the source). Our analysis gives a quantitative version of a qualitative characterization of identifiable sources that is due to Tahmasebi, Motahari, and Maddah-Ali (ISIT 2018).