MSDCDSCOMLAug 21, 2017

Preconditioned Spectral Clustering for Stochastic Block Partition Streaming Graph Challenge

arXiv:1708.07481v112 citations
Originality Incremental advance
AI Analysis

This work provides a faster and more scalable solution for graph partitioning in streaming and static scenarios, though it is incremental as it applies an existing optimization method to a specific challenge.

The paper tackles the problem of spectral clustering for large static and streaming graphs by applying the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method to graph Laplacians. It achieves 100% correctness on Challenge datasets, reducing computation time from over 5,000 seconds to 2 seconds for a 5K-vertex graph and handling up to 2M vertices in 1,700 seconds.

Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is demonstrated to efficiently solve eigenvalue problems for graph Laplacians that appear in spectral clustering. For static graph partitioning, 10-20 iterations of LOBPCG without preconditioning result in ~10x error reduction, enough to achieve 100% correctness for all Challenge datasets with known truth partitions, e.g., for graphs with 5K/.1M (50K/1M) Vertices/Edges in 2 (7) seconds, compared to over 5,000 (30,000) seconds needed by the baseline Python code. Our Python code 100% correctly determines 98 (160) clusters from the Challenge static graphs with 0.5M (2M) vertices in 270 (1,700) seconds using 10GB (50GB) of memory. Our single-precision MATLAB code calculates the same clusters at half time and memory. For streaming graph partitioning, LOBPCG is initiated with approximate eigenvectors of the graph Laplacian already computed for the previous graph, in many cases reducing 2-3 times the number of required LOBPCG iterations, compared to the static case. Our spectral clustering is generic, i.e. assuming nothing specific of the block model or streaming, used to generate the graphs for the Challenge, in contrast to the base code. Nevertheless, in 10-stage streaming comparison with the base code for the 5K graph, the quality of our clusters is similar or better starting at stage 4 (7) for emerging edging (snowballing) streaming, while the computations are over 100-1000 faster.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes