ML CV LGJan 21, 2025

Sublinear Variational Optimization of Gaussian Mixture Models with Millions to Billions of Parameters

Sebastian Salwig, Till Kahlke, Florian Hirschberger, Dennis Forster, Jörg Lücke

arXiv:2501.12299v14.5h-index: 7

Originality Incremental advance

AI Analysis

This work addresses the scalability problem for machine learning practitioners using GMMs on massive datasets, representing an incremental improvement through algorithmic optimization.

The paper tackles the computational challenge of training large Gaussian Mixture Models (GMMs) with high-dimensional data by proposing a variational approximation integrated with mixtures of factor analyzers, achieving speed-ups of an order-of-magnitude and training GMMs with over 10 billion parameters in about nine hours on a single CPU.

Gaussian Mixture Models (GMMs) range among the most frequently used machine learning models. However, training large, general GMMs becomes computationally prohibitive for datasets with many data points $N$ of high-dimensionality $D$. For GMMs with arbitrary covariances, we here derive a highly efficient variational approximation, which is integrated with mixtures of factor analyzers (MFAs). For GMMs with $C$ components, our proposed algorithm significantly reduces runtime complexity per iteration from $\mathcal{O}(NCD^2)$ to a complexity scaling linearly with $D$ and remaining constant w.r.t. $C$. Numerical validation of this theoretical complexity reduction then shows the following: the distance evaluations required for the entire GMM optimization process scale sublinearly with $NC$. On large-scale benchmarks, this sublinearity results in speed-ups of an order-of-magnitude compared to the state-of-the-art. As a proof of concept, we train GMMs with over 10 billion parameters on about 100 million images, and observe training times of approximately nine hours on a single state-of-the-art CPU.

View on arXiv PDF

Similar