SILGMLNov 15, 2019

Active learning in the geometric block model

arXiv:1912.06570v18 citations
Originality Incremental advance
AI Analysis

This work addresses community detection in networks by enhancing recovery accuracy with minimal labeled data, representing an incremental improvement over existing unsupervised methods.

The paper tackles the problem of exactly recovering community structures in random graphs using the geometric block model by introducing active learning algorithms that query node labels, showing that querying a sub-linear fraction of nodes achieves exact recovery where unsupervised methods fail, with validation on synthetic and real datasets.

The geometric block model is a recently proposed generative model for random graphs that is able to capture the inherent geometric properties of many community detection problems, providing more accurate characterizations of practical community structures compared with the popular stochastic block model. Galhotra et al. recently proposed a motif-counting algorithm for unsupervised community detection in the geometric block model that is proved to be near-optimal. They also characterized the regimes of the model parameters for which the proposed algorithm can achieve exact recovery. In this work, we initiate the study of active learning in the geometric block model. That is, we are interested in the problem of exactly recovering the community structure of random graphs following the geometric block model under arbitrary model parameters, by possibly querying the labels of a limited number of chosen nodes. We propose two active learning algorithms that combine the idea of motif-counting with two different label query policies. Our main contribution is to show that sampling the labels of a vanishingly small fraction of nodes (sub-linear in the total number of nodes) is sufficient to achieve exact recovery in the regimes under which the state-of-the-art unsupervised method fails. We validate the superior performance of our algorithms via numerical simulations on both real and synthetic datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes