DS DC LGJul 18, 2016

Distributed Graph Clustering by Load Balancing

arXiv:1607.04984v31.27 citations

Originality Incremental advance

AI Analysis

This addresses the need for efficient distributed clustering algorithms for big datasets, though it is incremental as it builds on existing load balancing protocols.

The paper tackles the problem of graph clustering in distributed settings by introducing a simple distributed algorithm based on load balancing, which achieves poly-logarithmic rounds and recovers a partition close to optimal for graphs with strong cluster-structure.

Graph clustering is a fundamental computational problem with a number of applications in algorithm design, machine learning, data mining, and analysis of social networks. Over the past decades, researchers have proposed a number of algorithmic design methods for graph clustering. However, most of these methods are based on complicated spectral techniques or convex optimisation, and cannot be applied directly for clustering many networks that occur in practice, whose information is often collected on different sites. Designing a simple and distributed clustering algorithm is of great interest, and has wide applications for processing big datasets. In this paper we present a simple and distributed algorithm for graph clustering: for a wide class of graphs that are characterised by a strong cluster-structure, our algorithm finishes in a poly-logarithmic number of rounds, and recovers a partition of the graph close to an optimal partition. The main component of our algorithm is an application of the random matching model of load balancing, which is a fundamental protocol in distributed computing and has been extensively studied in the past 20 years. Hence, our result highlights an intrinsic and interesting connection between graph clustering and load balancing. At a technical level, we present a purely algebraic result characterising the early behaviours of load balancing processes for graphs exhibiting a cluster-structure. We believe that this result can be further applied to analyse other gossip processes, such as rumour spreading and averaging processes.

View on arXiv PDF

Similar