Community models for networks observed through edge nominations
This work addresses community detection for networks collected via edge nominations, which is a common but noisy data collection method in social and organizational studies.
The authors tackled the problem of community detection in networks where edges are observed through noisy and biased node queries, proposing a model that incorporates sampling probabilities based on individual and community parameters. They showed that spectral clustering can detect communities under this model, demonstrated consistency and computational efficiency, and applied it to a faculty hiring dataset to reveal a hierarchy among US business schools.
Communities are a common and widely studied structure in networks, typically under the assumption that the network is fully and correctly observed. In practice, network data are often collected by querying nodes about their connections. In some settings, all edges of a sampled node will be recorded, and in others, a node may be asked to name its connections. These sampling mechanisms introduce noise and bias which can obscure the community structure and invalidate assumptions underlying standard community detection methods. We propose a general model for a class of network sampling mechanisms based on recording edges via querying nodes, designed to improve community detection for network data collected in this fashion. We model edge sampling probabilities as a function of both individual preferences and community parameters, and show community detection can be performed by spectral clustering under this general class of models. We also propose, as a special case of the general framework, a parametric model for directed networks we call the nomination stochastic block model, which allows for meaningful parameter interpretations and can be fitted by the method of moments. Both spectral clustering and the method of moments in this case are computationally efficient and come with theoretical guarantees of consistency. We evaluate the proposed model in simulation studies on both unweighted and weighted networks and apply it to a faculty hiring dataset, discovering a meaningful hierarchy of communities among US business schools.