Estimating the number of clusters of a Block Markov Chain
This work addresses the challenge of selecting cluster numbers in clustering algorithms for sequential data, which is incremental as it builds on existing spectral and Stochastic Block Model techniques.
The authors tackled the problem of estimating the number of clusters in sequential data modeled as a Block Markov Chain, presenting a method that uses spectral embedding and density-based clustering to achieve asymptotic consistency, even with sparse data.
Clustering algorithms frequently require the number of clusters to be chosen in advance, but it is usually not clear how to do this. To tackle this challenge when clustering within sequential data, we present a method for estimating the number of clusters when the data is a trajectory of a Block Markov Chain. Block Markov Chains are Markov Chains that exhibit a block structure in their transition matrix. The method considers a matrix that counts the number of transitions between different states within the trajectory, and transforms this into a spectral embedding whose dimension is set via singular value thresholding. The number of clusters is subsequently estimated via density-based clustering of this spectral embedding, an approach inspired by literature on the Stochastic Block Model. By leveraging and augmenting recent results on the spectral concentration of random matrices with Markovian dependence, we show that the method is asymptotically consistent - in spite of the dependencies between the count matrix's entries, and even when the count matrix is sparse. We also present a numerical evaluation of our method, and compare it to alternatives.