DS AI SISep 24, 2013

Partition-Merge: Distributed Inference and Modularity Optimization

Vincent Blondel, Kyomin Jung, Pushmeet Kohli, Devavrat Shah

arXiv:1309.6129v1

Originality Incremental advance

AI Analysis

This enables scalable distributed inference and modularity optimization for large graphs, particularly benefiting applications in network analysis and probabilistic modeling, though it is incremental as it builds on existing centralized methods.

The paper introduces Partition-Merge (PM), a meta algorithm that converts centralized graph algorithms into distributed versions, achieving near-linear runtime and comparable or better performance on graphs with geometric structures, such as providing (C+δ)-factor approximations for constant-factor centralized algorithms.

This paper presents a novel meta algorithm, Partition-Merge (PM), which takes existing centralized algorithms for graph computation and makes them distributed and faster. In a nutshell, PM divides the graph into small subgraphs using our novel randomized partitioning scheme, runs the centralized algorithm on each partition separately, and then stitches the resulting solutions to produce a global solution. We demonstrate the efficiency of the PM algorithm on two popular problems: computation of Maximum A Posteriori (MAP) assignment in an arbitrary pairwise Markov Random Field (MRF), and modularity optimization for community detection. We show that the resulting distributed algorithms for these problems essentially run in time linear in the number of nodes in the graph, and perform as well -- or even better -- than the original centralized algorithm as long as the graph has geometric structures. Here we say a graph has geometric structures, or polynomial growth property, when the number of nodes within distance r of any given node grows no faster than a polynomial function of r. More precisely, if the centralized algorithm is a C-factor approximation with constant C \ge 1, the resulting distributed algorithm is a (C+δ)-factor approximation for any small δ>0; but if the centralized algorithm is a non-constant (e.g. logarithmic) factor approximation, then the resulting distributed algorithm becomes a constant factor approximation. For general graphs, we compute explicit bounds on the loss of performance of the resulting distributed algorithm with respect to the centralized algorithm.

View on arXiv PDF

Similar