MESTMLDec 18, 2020

Exact Clustering in Tensor Block Model: Statistical Optimality and Computational Limit

arXiv:2012.09996v449 citations
AI Analysis

This work is significant for researchers and practitioners working with multiway datasets in fields like neuroimaging and genomics, as it provides statistically optimal and computationally efficient methods for identifying heterogeneous substructures, addressing a known challenge in high-order clustering.

This paper addresses high-order clustering in multiway datasets by proposing a tensor block model and two computationally efficient methods, high-order Lloyd algorithm (HLloyd) and high-order spectral clustering (HSC). The authors establish convergence guarantees and statistical optimality for their methods under sub-Gaussian noise, and characterize the statistical-computational trade-off for exact clustering under a Gaussian tensor block model across three signal-to-noise ratio regimes.

High-order clustering aims to identify heterogeneous substructures in multiway datasets that arise commonly in neuroimaging, genomics, social network studies, etc. The non-convex and discontinuous nature of this problem pose significant challenges in both statistics and computation. In this paper, we propose a tensor block model and the computationally efficient methods, \emph{high-order Lloyd algorithm} (HLloyd), and high-order spectral clustering (HSC), for high-order clustering. The convergence guarantees and statistical optimality are established for the proposed procedure under a mild sub-Gaussian noise assumption. Under the Gaussian tensor block model, we completely characterize the statistical-computational trade-off for achieving high-order exact clustering based on three different signal-to-noise ratio regimes. The analysis relies on new techniques of high-order spectral perturbation analysis and a ``singular-value-gap-free'' error bound in tensor estimation, which are substantially different from the matrix spectral analyses in the literature. Finally, we show the merits of the proposed procedures via extensive experiments on both synthetic and real datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes