Disentangling homophily, community structure and triadic closure in networks
This work is significant for network scientists and analysts who need to accurately understand the underlying mechanisms driving network formation, especially in cases where homophily and transitivity are intertwined.
This paper addresses the conflation of homophily and transitivity in network analysis by introducing a generative model and inference procedure. Their method, a variation of the stochastic block model with triadic closure, can distinguish between these mechanisms and identify the most plausible cause for each edge, improving edge prediction compared to the pure SBM.
Network homophily, the tendency of similar nodes to be connected, and transitivity, the tendency of two nodes being connected if they share a common neighbor, are conflated properties in network analysis, since one mechanism can drive the other. Here we present a generative model and corresponding inference procedure that are capable of distinguishing between both mechanisms. Our approach is based on a variation of the stochastic block model (SBM) with the addition of triadic closure edges, and its inference can identify the most plausible mechanism responsible for the existence of every edge in the network, in addition to the underlying community structure itself. We show how the method can evade the detection of spurious communities caused solely by the formation of triangles in the network, and how it can improve the performance of edge prediction when compared to the pure version of the SBM without triadic closure.