High-dimensional Gaussian graphical model for network-linked data
This work addresses the challenge of analyzing network-linked data in statistics and machine learning, offering a novel method for a specific bottleneck rather than a broad breakthrough.
The authors tackled the problem of estimating Gaussian graphical models from high-dimensional data where observations are linked by a network, violating standard independence assumptions, by developing a model that allows for smoothly varying means over the network and proving correct estimation under network cohesion assumptions.
Graphical models are commonly used to represent conditional dependence relationships between variables. There are multiple methods available for exploring them from high-dimensional data, but almost all of them rely on the assumption that the observations are independent and identically distributed. At the same time, observations connected by a network are becoming increasingly common, and tend to violate these assumptions. Here we develop a Gaussian graphical model for observations connected by a network with potentially different mean vectors, varying smoothly over the network. We propose an efficient estimation algorithm and demonstrate its effectiveness on both simulated and real data, obtaining meaningful and interpretable results on a statistics coauthorship network. We also prove that our method estimates both the inverse covariance matrix and the corresponding graph structure correctly under the assumption of network “cohesion”, which refers to the empirically observed phenomenon of network neighbors sharing similar traits.