Mixtures Closest to a Given Measure: A Semidefinite Programming Approach
This provides a method for determining mixture order and parameters in high-dimensional settings, useful for clustering and accelerating standard algorithms, but it is incremental as it builds on existing semidefinite programming approaches.
The paper tackles the problem of approximating a target measure, known only through its moments, using mixtures from parametric families like Gaussian or Poisson, with distances measured by 2-Wasserstein or total variation. It introduces a hierarchy of semidefinite relaxations that asymptotically converge to the optimal value, and in some cases, achieves finite convergence and recovery of the optimal mixing measure.
Mixture models, such as Gaussian mixture models, are widely used in machine learning to represent complex data distributions. A key challenge, especially in high-dimensional settings, is to determine the mixture order and estimate the mixture parameters. We study the problem of approximating a target measure, available only through finitely many of its moments, by a mixture of distributions from a parametric family (e.g., Gaussian, exponential, Poisson), with approximation quality measured by the 2-Wasserstein or the total variation distance. Unlike many existing approaches, the parameter set is not assumed to be finite; it is modeled as a compact basic semi-algebraic set. We introduce a hierarchy of semidefinite relaxations with asymptotic convergence to the desired optimal value. In addition, when a certain rank condition is satisfied, the convergence is even finite and recovery of an optimal mixing measure is obtained. We also present an application to clustering, where our framework serves either as a stand-alone method or as a preprocessing step that yields both the number of clusters and strong initial parameter estimates, thereby accelerating convergence of standard (local) clustering algorithms.