Parallel and Streaming Algorithms for K-Core Decomposition
This addresses a fundamental need in machine learning and data mining for efficient large-scale graph analysis, representing a novel advancement rather than an incremental improvement.
The paper tackles the problem of computing and maintaining approximate k-core decomposition in distributed and streaming settings, presenting the first algorithms with provable guarantees on space complexity and computational passes/rounds, and demonstrates empirical effectiveness on public graphs.
The $k$-core decomposition is a fundamental primitive in many machine learning and data mining applications. We present the first distributed and the first streaming algorithms to compute and maintain an approximate $k$-core decomposition with provable guarantees. Our algorithms achieve rigorous bounds on space complexity while bounding the number of passes or number of rounds of computation. We do so by presenting a new powerful sketching technique for $k$-core decomposition, and then by showing it can be computed efficiently in both streaming and MapReduce models. Finally, we confirm the effectiveness of our sketching technique empirically on a number of publicly available graphs.