Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel
This work addresses the challenge of improving clustering accuracy in networks with varied node degrees, which is incremental as it builds on prior variations of spectral clustering.
The paper tackles the problem of spectral clustering's performance on networks with heterogeneous node degrees by extending statistical estimation results to the canonical algorithm, removing minimum degree assumptions and providing tuning parameter guidance, while explaining the 'star shape' in eigenvectors using degree-corrected models.
Spectral clustering is a fast and popular algorithm for finding clusters in networks. Recently, Chaudhuri et al. (2012) and Amini et al.(2012) proposed inspired variations on the algorithm that artificially inflate the node degrees for improved statistical performance. The current paper extends the previous statistical estimation results to the more canonical spectral clustering algorithm in a way that removes any assumption on the minimum degree and provides guidance on the choice of the tuning parameter. Moreover, our results show how the "star shape" in the eigenvectors--a common feature of empirical networks--can be explained by the Degree-Corrected Stochastic Blockmodel and the Extended Planted Partition model, two statistical models that allow for highly heterogeneous degrees. Throughout, the paper characterizes and justifies several of the variations of the spectral clustering algorithm in terms of these models.