Randomized spectral co-clustering for large-scale directed networks
This work addresses scalability issues in network analysis for researchers and practitioners dealing with large directed datasets, representing an incremental improvement over existing spectral methods.
The paper tackles the computational challenge of co-clustering large-scale directed networks by developing randomized spectral co-clustering algorithms, achieving better theoretical error bounds than state-of-the-art and testing on networks with up to millions of nodes.
Directed networks are broadly used to represent asymmetric relationships among units. Co-clustering aims to cluster the senders and receivers of directed networks simultaneously. In particular, the well-known spectral clustering algorithm could be modified as the spectral co-clustering to co-cluster directed networks. However, large-scale networks pose great computational challenges to it. In this paper, we leverage sketching techniques and derive two randomized spectral co-clustering algorithms, one \emph{random-projection-based} and the other \emph{random-sampling-based}, to accelerate the co-clustering of large-scale directed networks. We theoretically analyze the resulting algorithms under two generative models -- the stochastic co-block model and the degree-corrected stochastic co-block model, and establish their approximation error rates and misclustering error rates, indicating better bounds than the state-of-the-art results of co-clustering literature. Numerically, we design and conduct simulations to support our theoretical results and test the efficiency of the algorithms on real networks with up to millions of nodes. A publicly available R package \textsf{RandClust} is developed for better usability and reproducibility of the proposed methods.