A Streaming Algorithm for Graph Clustering
This enables efficient large-scale graph analysis for applications like social networks or web graphs, though it appears incremental as it builds on modularity-based approaches.
The authors tackled the problem of graph clustering in edge streaming settings where graphs are processed as one-pass edge sequences, developing an algorithm that runs over 10 times faster than existing methods while maintaining or improving detection scores on graphs with up to one billion edges.
We introduce a novel algorithm to perform graph clustering in the edge streaming setting. In this model, the graph is presented as a sequence of edges that can be processed strictly once. Our streaming algorithm has an extremely low memory footprint as it stores only three integers per node and does not keep any edge in memory. We provide a theoretical justification of the design of the algorithm based on the modularity function, which is a usual metric to evaluate the quality of a graph partition. We perform experiments on massive real-life graphs ranging from one million to more than one billion edges and we show that this new algorithm runs more than ten times faster than existing algorithms and leads to similar or better detection scores on the largest graphs.