Low-degree Lower bounds for clustering in moderate dimension
This work provides new theoretical insights into the computational hardness of clustering for researchers in theoretical computer science and machine learning, particularly in the moderate-dimensional setting where existing methods fall short of information-theoretic limits.
This paper investigates the computational limits of clustering $n$ points into $K$ groups from a mixture of isotropic Gaussians in $\mathbb{R}^d$, focusing on the moderate-dimensional regime ($n \geq dK$). It establishes a new low-degree polynomial lower bound for the minimal distance $\Delta$ between mean vectors required for partial recovery, revealing a "non-parametric rate" that differs from high-dimensional scenarios.
We study the fundamental problem of clustering $n$ points into $K$ groups drawn from a mixture of isotropic Gaussians in $\mathbb{R}^d$. Specifically, we investigate the requisite minimal distance $Δ$ between mean vectors to partially recover the underlying partition. While the minimax-optimal threshold for $Δ$ is well-established, a significant gap exists between this information-theoretic limit and the performance of known polynomial-time procedures. Although this gap was recently characterized in the high-dimensional regime ($n \leq dK$), it remains largely unexplored in the moderate-dimensional regime ($n \geq dK$). In this manuscript, we address this regime by establishing a new low-degree polynomial lower bound for the moderate-dimensional case when $d \geq K$. We show that while the difficulty of clustering for $n \leq dK$ is primarily driven by dimension reduction and spectral methods, the moderate-dimensional regime involves more delicate phenomena leading to a "non-parametric rate". We provide a novel non-spectral algorithm matching this rate, shedding new light on the computational limits of the clustering problem in moderate dimension.