Implicit High-Order Moment Tensor Estimation and Learning Latent Variable Models
This work addresses the computational bottleneck in moment-based methods for latent-variable models, offering efficient algorithms for machine learning practitioners dealing with high-dimensional data, though it builds incrementally on prior work for specific cases.
The paper tackles the problem of learning latent-variable models by developing an efficient algorithm for implicit high-order moment tensor computation, which avoids the exponential complexity of explicit tensors, and applies it to achieve polynomial-time learning algorithms for mixtures of linear regressions, mixtures of spherical Gaussians, and positive linear combinations of non-linear activations, with specific error bounds and conditions.
We study the task of learning latent-variable models. A common algorithmic technique for this task is the method of moments. Unfortunately, moment-based approaches are hampered by the fact that the moment tensors of super-constant degree cannot even be written down in polynomial time. Motivated by such learning applications, we develop a general efficient algorithm for {\em implicit moment tensor computation}. Our framework generalizes the work of~\cite{LL21-opt} which developed an efficient algorithm for the specific moment tensors that arise in clustering mixtures of spherical Gaussians. By leveraging our implicit moment estimation algorithm, we obtain the first $\mathrm{poly}(d, k)$-time learning algorithms for the following models. * {\bf Mixtures of Linear Regressions} We give a $\mathrm{poly}(d, k, 1/ε)$-time algorithm for this task, where $ε$ is the desired error. * {\bf Mixtures of Spherical Gaussians} For density estimation, we give a $\mathrm{poly}(d, k, 1/ε)$-time learning algorithm, where $ε$ is the desired total variation error, under the condition that the means lie in a ball of radius $O(\sqrt{\log k})$. For parameter estimation, we give a $\mathrm{poly}(d, k, 1/ε)$-time algorithm under the {\em optimal} mean separation of $Ω(\log^{1/2}(k/ε))$. * {\bf Positive Linear Combinations of Non-Linear Activations} We give a general algorithm for this task with complexity $\mathrm{poly}(d, k) g(ε)$, where $ε$ is the desired error and the function $g$ depends on the Hermite concentration of the target class of functions. Specifically, for positive linear combinations of ReLU activations, our algorithm has complexity $\mathrm{poly}(d, k) 2^{\mathrm{poly}(1/ε)}$.