MLSTAT-MECHSIFeb 19, 2015

Finding One Community in a Sparse Graph

arXiv:1502.05680v273 citations
AI Analysis

This work addresses the challenge of community detection in social networks or relational datasets, providing theoretical insights into algorithmic limits, but it is incremental as it builds on existing methods like belief propagation and phase transition analysis.

The paper tackles the problem of identifying a hidden community in a sparse graph with bounded average degree, using the cavity method to derive an exact phase diagram and proving that local algorithms like belief propagation succeed above a threshold where the difference in internal and external average degrees exceeds a specific formula, but fail below it.

We consider a random sparse graph with bounded average degree, in which a subset of vertices has higher connectivity than the background. In particular, the average degree inside this subset of vertices is larger than outside (but still bounded). Given a realization of such graph, we aim at identifying the hidden subset of vertices. This can be regarded as a model for the problem of finding a tightly knitted community in a social network, or a cluster in a relational dataset. In this paper we present two sets of contributions: $(i)$ We use the cavity method from spin glass theory to derive an exact phase diagram for the reconstruction problem. In particular, as the difference in edge probability increases, the problem undergoes two phase transitions, a static phase transition and a dynamic one. $(ii)$ We establish rigorous bounds on the dynamic phase transition and prove that, above a certain threshold, a local algorithm (belief propagation) correctly identify most of the hidden set. Below the same threshold \emph{no local algorithm} can achieve this goal. However, in this regime the subset can be identified by exhaustive search. For small hidden sets and large average degree, the phase transition for local algorithms takes an intriguingly simple form. Local algorithms succeed with high probability for ${\rm deg}_{\rm in} - {\rm deg}_{\rm out} > \sqrt{{\rm deg}_{\rm out}/e}$ and fail for ${\rm deg}_{\rm in} - {\rm deg}_{\rm out} < \sqrt{{\rm deg}_{\rm out}/e}$ (with ${\rm deg}_{\rm in}$, ${\rm deg}_{\rm out}$ the average degrees inside and outside the community). We argue that spectral algorithms are also ineffective in the latter regime. It is an open problem whether any polynomial time algorithms might succeed for ${\rm deg}_{\rm in} - {\rm deg}_{\rm out} < \sqrt{{\rm deg}_{\rm out}/e}$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes