Spectral inference for large Stochastic Blockmodels with nodal covariates
This work addresses the need for efficient inference in network analysis for researchers and practitioners dealing with large datasets, though it is incremental as it builds on existing spectral methods by incorporating covariates.
The paper tackles the problem of distinguishing observed and unobserved factors in network structure by developing spectral estimators for stochastic blockmodels with nodal covariates, showing that the estimator is faster than standard algorithms and scales well for large networks, with Monte Carlo experiments indicating good performance and an application to Facebook data revealing homophily in gender, role, and campus-residence while discovering unobserved communities.
In many applications of network analysis, it is important to distinguish between observed and unobserved factors affecting network structure. To this end, we develop spectral estimators for both unobserved blocks and the effect of covariates in stochastic blockmodels. On the theoretical side, we establish asymptotic normality of our estimators for the subsequent purpose of performing inference. On the applied side, we show that computing our estimator is much faster than standard variational expectation--maximization algorithms and scales well for large networks. Monte Carlo experiments suggest that the estimator performs well under different data generating processes. Our application to Facebook data shows evidence of homophily in gender, role and campus-residence, while allowing us to discover unobserved communities. The results in this paper provide a foundation for spectral estimation of the effect of observed covariates as well as unobserved latent community structure on the probability of link formation in networks.