Consistent spectral clustering in sparse tensor block models
This addresses clustering challenges in sparse high-dimensional multiway data for fields like bioinformatics and social network analysis, representing an incremental advancement in tensor methods.
The paper tackles high-order clustering in sparse multiway datasets by introducing a tensor block model for sparse integer-valued data and proposing a spectral clustering algorithm with trimming. The approach identifies a density threshold for algorithm consistency and provides a framework for evaluating signal-noise tradeoffs during data aggregation.
High-order clustering aims to classify objects in multiway datasets that are prevalent in various fields such as bioinformatics, social network analysis, and recommendation systems. These tasks often involve data that is sparse and high-dimensional, presenting significant statistical and computational challenges. This paper introduces a tensor block model specifically designed for sparse integer-valued data tensors. We propose a simple spectral clustering algorithm augmented with a trimming step to mitigate noise fluctuations, and identify a density threshold that ensures the algorithm's consistency. Our approach models sparsity using a sub-Poisson noise concentration framework, accommodating heavier than sub-Gaussian tails. Remarkably, this natural class of tensor block models is closed under aggregation across arbitrary modes. Consequently, we obtain a comprehensive framework for evaluating the tradeoff between signal loss and noise reduction during data aggregation. The analysis is based on a novel concentration bound for sparse random Gram matrices. The theoretical findings are illustrated through simulation experiments.