How Many Communities Are There?
This work addresses model selection for community detection in social networks, offering an incremental improvement over existing methods like BIC by handling violations of conditional independence assumptions.
The authors tackled the problem of selecting the number of communities in stochastic blockmodels for network data, proposing a composite likelihood BIC (CL-BIC) method that is robust to model misspecifications, as demonstrated through simulations and real data.
Stochastic blockmodels and variants thereof are among the most widely used approaches to community detection for social networks and relational data. A stochastic blockmodel partitions the nodes of a network into disjoint sets, called communities. The approach is inherently related to clustering with mixture models; and raises a similar model selection problem for the number of communities. The Bayesian information criterion (BIC) is a popular solution, however, for stochastic blockmodels, the conditional independence assumption given the communities of the endpoints among different edges is usually violated in practice. In this regard, we propose composite likelihood BIC (CL-BIC) to select the number of communities, and we show it is robust against possible misspecifications in the underlying stochastic blockmodel assumptions. We derive the requisite methodology and illustrate the approach using both simulated and real data. Supplementary materials containing the relevant computer code are available online.