Submodular Variational Inference for Network Reconstruction
This addresses the challenge of inferring hidden connections in social networks from limited diffusion data, offering a computationally efficient solution for applications like online social analysis, though it is incremental as it builds on variational inference methods.
The paper tackles the network reconstruction problem from observed diffusion cascades, such as respondent-driven sampling, by proving the posterior distribution is Bayesian log-submodular and proposing VINE, a variational inference algorithm that recovers connected graphs with high accuracy in experiments on real-life networks.
In real-world and online social networks, individuals receive and transmit information in real time. Cascading information transmissions (e.g. phone calls, text messages, social media posts) may be understood as a realization of a diffusion process operating on the network, and its branching path can be represented by a directed tree. The process only traverses and thus reveals a limited portion of the edges. The network reconstruction/inference problem is to infer the unrevealed connections. Most existing approaches derive a likelihood and attempt to find the network topology maximizing the likelihood, a problem that is highly intractable. In this paper, we focus on the network reconstruction problem for a broad class of real-world diffusion processes, exemplified by a network diffusion scheme called respondent-driven sampling (RDS). We prove that under realistic and general models of network diffusion, the posterior distribution of an observed RDS realization is a Bayesian log-submodular model.We then propose VINE (Variational Inference for Network rEconstruction), a novel, accurate, and computationally efficient variational inference algorithm, for the network reconstruction problem under this model. Crucially, we do not assume any particular probabilistic model for the underlying network. VINE recovers any connected graph with high accuracy as shown by our experimental results on real-life networks.