SI LGAug 15, 2021

SamBaS: Sampling-Based Stochastic Block Partitioning

Frank Wanye, Vitaliy Gleyzer, Edward Kao, Wu-chun Feng

arXiv:2108.06651v21.2

Originality Incremental advance

AI Analysis

This work addresses the need for faster community detection algorithms in domains like networking and bioinformatics, offering a practical speedup for a statistically robust but hard-to-parallelize method, though it is incremental as it builds on existing SBP.

The paper tackles the challenge of accelerating stochastic block partitioning (SBP) for community detection in sparse graphs by introducing SamBaS, a sampling-based method that achieves speedups of up to 10X while maintaining or improving result quality, with improvements of over 150% in F1 score on some graphs.

Community detection is a well-studied problem with applications in domains ranging from networking to bioinformatics. Due to the rapid growth in the volume of real-world data, there is growing interest in accelerating contemporary community detection algorithms. However, the more accurate and statistically robust methods tend to be hard to parallelize. One such method is stochastic block partitioning (SBP) - a community detection algorithm that works well on graphs with complex and heterogeneous community structure. In this paper, we present a sampling-based SBP (SamBaS) for accelerating SBP on sparse graphs. We characterize how various graph parameters affect the speedup and result quality of community detection with SamBaS and quantify the trade-offs therein. To evaluate SamBas on real-world web graphs without known ground-truth communities, we introduce partition quality score (PQS), an evaluation metric that outperforms modularity in terms of correlation with F1 score. Overall, SamBaS achieves speedups of up to 10X while maintaining result quality (and even improving result quality by over 150% on certain graphs, relative to F1 score).

View on arXiv PDF

Similar