Sampling Random Graphs from the Colored Configuration Model
For network analysts studying polarization in online social networks, this provides a more accurate null model that preserves color assortativity, enabling better statistical testing of network phenomena.
The paper introduces the Colored Configuration Model (CCM), a null model for vertex-colored multigraphs that preserves the Colored Degree Matrix (CDM), enabling statistically sound analysis of social networks. The proposed Sirius algorithm samples from CCM with provably faster mixing, and experiments show it can lead to different insights on polarization significance compared to existing null models.
A fundamental step in knowledge discovery is statistically assessing data mining results. In network analysis, such evaluation compares the outcome of a given procedure with the outcomes obtained from randomized versions of the observed network. Despite its importance, available graph null models only preserve simple characteristics of the observed graph, such as its degree sequence. In this paper we introduce the Colored Configuration Model (CCM), a new null model for vertex-colored multigraphs. Our main motivation is the study of online social networks, where the color of a user represents their side in a debate. The key novelty of CCM is preserving the Colored Degree Matrix (CDM), which encodes, for each vertex, the number of neighbors of any given color. Preserving the CDM allows fixing the color assortativity of all nodes, e.g., the propensity of each user to interact with other like-minded users. This allows testing whether a given phenomenon is explained by the observed CDM, or whether other characteristics of the network might play a key role. Available graph null models do not preserve the CDM, so they cannot assess its impact on real-world tasks, such as testing the significance of network polarization measures. To sample from the CCM, we develop Sirius-B, a simple baseline adapting the Metropolis-Hastings approach, and Sirius, a refined algorithm tailored to preserve the CDM, thus achieving provably faster mixing. In our experimental evaluation, we test Sirius on real-world networks, comparing it with related network null models. We observed that the evaluation of the statistical significance of polarization measures with Sirius may lead to different insights compared to available null models. Thus, Sirius is an effective tool for the statistically-sound analysis of social networks.