MLDIS-NNITOct 10, 2016

Phase transitions and optimal algorithms in high-dimensional Gaussian mixture clustering

arXiv:1610.02918v144 citations
Originality Incremental advance
AI Analysis

This addresses fundamental limits in clustering for high-dimensional data, with incremental theoretical insights.

The paper tackles the problem of Gaussian mixture clustering in high dimensions, determining the critical data-to-dimension ratio and cluster separation at which cluster membership can be recovered better than chance, and finds a performance gap between optimal and known algorithms when the number of clusters is large.

We consider the problem of Gaussian mixture clustering in the high-dimensional limit where the data consists of $m$ points in $n$ dimensions, $n,m \rightarrow \infty$ and $α= m/n$ stays finite. Using exact but non-rigorous methods from statistical physics, we determine the critical value of $α$ and the distance between the clusters at which it becomes information-theoretically possible to reconstruct the membership into clusters better than chance. We also determine the accuracy achievable by the Bayes-optimal estimation algorithm. In particular, we find that when the number of clusters is sufficiently large, $r > 4 + 2 \sqrtα$, there is a gap between the threshold for information-theoretically optimal performance and the threshold at which known algorithms succeed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes