Spectral goodness-of-fit tests for complete and partial network data
This provides a computationally efficient way to assess model fit for network data, including partial data, which is useful for researchers in network analysis.
The authors tackled the problem of determining whether a parametric model fits network data well and will extrapolate, using random matrix theory to derive a general goodness-of-fit test that avoids simulation and specific graph statistics. They showed the method performs well in simulations and improves community detection algorithms in empirical networks.
Networks describe the, often complex, relationships between individual actors. In this work, we address the question of how to determine whether a parametric model, such as a stochastic block model or latent space model, fits a dataset well and will extrapolate to similar data. We use recent results in random matrix theory to derive a general goodness-of-fit test for dyadic data. We show that our method, when applied to a specific model of interest, provides an straightforward, computationally fast way of selecting parameters in a number of commonly used network models. For example, we show how to select the dimension of the latent space in latent space models. Unlike other network goodness-of-fit methods, our general approach does not require simulating from a candidate parametric model, which can be cumbersome with large graphs, and eliminates the need to choose a particular set of statistics on the graph for comparison. It also allows us to perform goodness-of-fit tests on partial network data, such as Aggregated Relational Data. We show with simulations that our method performs well in many situations of interest. We analyze several empirically relevant networks and show that our method leads to improved community detection algorithms. R code to implement our method is available on Github.