LGMLSep 6, 2019

AutoGMM: Automatic Gaussian Mixture Modeling in Python

arXiv:1909.02688v615 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This provides an incremental solution for Python users needing automatic, uncertainty-aware clustering without manual hyperparameter tuning.

The authors tackled the problem of automating Gaussian mixture model clustering in Python, which lacked a tool comparable to mclust in R, by introducing AutoGMM, an open-source package that achieves strong out-of-the-box performance on benchmarks and real datasets with favorable runtime scaling.

The exponential growth of complex data demands fully automatic clustering. Gaussian mixture models (GMMs) provide uncertainty-aware grouping but often require expertise to specify hyperparameters, e.g., component count and covariance structure. While mclust (R) automates this via Bayesian Information Criterion (BIC), Python lacks a comparable tool. We introduce AutoGMM, an open-source Python package automating GMM via strategic initialization using an agglomerative Mahalanobis heuristic, and parallelized model selection by information criteria. AutoGMM is a drop-in tool that yields strong out-of-the-box performance on classic benchmarks, targeted stress tests, and two real datasets, with favorable runtime scaling. The code is available at https://github.com/neurodata/AutoGMM with tests and reproducible workflows.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes