STMLJan 11, 2015

Identifiability and optimal rates of convergence for parameters of multiple types in finite mixtures

arXiv:1501.02497v14 citations
Originality Incremental advance
AI Analysis

This work addresses theoretical challenges in mixture modeling for statisticians and machine learning researchers, providing foundational insights but is incremental as it builds on prior work.

This paper tackles the problem of parameter identifiability and convergence rates in finite mixture models, establishing optimal rates of convergence for mixing distributions, such as $n^{-1/2}$ under $W_1$ for exact-fitted models and $n^{-1/4}$ under $W_2$ for over-fitted settings, and shows that for weakly identifiable classes like Gaussian mixtures, rates deteriorate rapidly with extra components.

This paper studies identifiability and convergence behaviors for parameters of multiple types in finite mixtures, and the effects of model fitting with extra mixing components. First, we present a general theory for strong identifiability, which extends from the previous work of Nguyen [2013] and Chen [1995] to address a broad range of mixture models and to handle matrix-variate parameters. These models are shown to share the same Wasserstein distance based optimal rates of convergence for the space of mixing distributions --- $n^{-1/2}$ under $W_1$ for the exact-fitted and $n^{-1/4}$ under $W_2$ for the over-fitted setting, where $n$ is the sample size. This theory, however, is not applicable to several important model classes, including location-scale multivariate Gaussian mixtures, shape-scale Gamma mixtures and location-scale-shape skew-normal mixtures. The second part of this work is devoted to demonstrating that for these "weakly identifiable" classes, algebraic structures of the density family play a fundamental role in determining convergence rates of the model parameters, which display a very rich spectrum of behaviors. For instance, the optimal rate of parameter estimation in an over-fitted location-covariance Gaussian mixture is precisely determined by the order of a solvable system of polynomial equations --- these rates deteriorate rapidly as more extra components are added to the model. The established rates for a variety of settings are illustrated by a simulation study.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes