SOC-PH SI MLJan 30, 2018

Weighted Community Detection and Data Clustering Using Message Passing

arXiv:1801.09829v115 citations

Originality Incremental advance

AI Analysis

This work addresses clustering and community detection problems in science and engineering, offering a robust, principled method that is incremental in extending message passing and spectral algorithms to weighted cases.

The authors tackled the problem of weighted community detection and data clustering by developing a non-parametric method based on statistical physics, mapping it to a Potts model and using belief propagation. They showed that their algorithm significantly outperforms existing methods in weighted and directed networks, works to the theoretical detectability limit in sparse mixture models, and requires only a few labels for perfect semi-supervised clustering.

Grouping objects into clusters based on similarities or weights between them is one of the most important problems in science and engineering. In this work, by extending message passing algorithms and spectral algorithms proposed for unweighted community detection problem, we develop a non-parametric method based on statistical physics, by mapping the problem to Potts model at the critical temperature of spin glass transition and applying belief propagation to solve the marginals corresponding to the Boltzmann distribution. Our algorithm is robust to over-fitting and gives a principled way to determine whether there are significant clusters in the data and how many clusters there are. We apply our method to different clustering tasks and use extensive numerical experiments to illustrate the advantage of our method over existing algorithms. In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms. In the clustering problem when the data was generated by mixture models in the sparse regime we show that our method works to the theoretical limit of detectability and gives accuracy very close to that of the optimal Bayesian inference. In the semi-supervised clustering problem, our method only needs several labels to work perfectly in classic datasets. Finally, we further develop Thouless-Anderson-Palmer equations which reduce heavily the computation complexity in dense-networks but gives almost the same performance as belief propagation.

View on arXiv PDF

Similar