Whitening Spherical Gaussian Mixtures in the Large-Dimensional Regime
This addresses a bottleneck in unsupervised learning for high-dimensional data analysis, though it is incremental as it builds on existing whitening techniques.
The paper tackles the problem of whitening spherical Gaussian mixture models in high-dimensional, low-sample-size regimes where standard whitening fails due to spectral distortion, and it proposes a corrected whitening matrix that restores asymptotic orthogonality, leading to performance gains in estimation.
Whitening is a classical technique in unsupervised learning that can facilitate estimation tasks by standardizing data. An important application is the estimation of latent variable models via the decomposition of tensors built from high-order moments. In particular, whitening orthogonalizes the means of a spherical Gaussian mixture model (GMM), thereby making the corresponding moment tensor orthogonally decomposable, hence easier to decompose. However, in the large-dimensional regime (LDR) where data are high-dimensional and scarce, the standard whitening matrix built from the sample covariance becomes ineffective because the latter is spectrally distorted. Consequently, whitened means of a spherical GMM are no longer orthogonal. Using random matrix theory, we derive exact limits for their dot products, which are generally nonzero in the LDR. As our main contribution, we then construct a corrected whitening matrix that restores asymptotic orthogonality, allowing for performance gains in spherical GMM estimation.