IT ST MLJan 10, 2015

On model misspecification and KL separation for Gaussian graphical models

arXiv:1501.02320v27.316 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of model misspecification in Gaussian graphical models for statisticians and machine learning practitioners, providing theoretical rigor on the importance of accurate edge estimation, but it is incremental as it builds on existing KL divergence and graphical model theory.

The paper establishes a lower bound on the KL divergence between multivariate Gaussian distributions based on the Hamming distance between their graphical model edge sets, showing it is constant when graphs differ by at least one edge, and derives sample size requirements for correct model selection via maximum likelihood estimation.

We establish bounds on the KL divergence between two multivariate Gaussian distributions in terms of the Hamming distance between the edge sets of the corresponding graphical models. We show that the KL divergence is bounded below by a constant when the graphs differ by at least one edge; this is essentially the tightest possible bound, since classes of graphs exist for which the edge discrepancy increases but the KL divergence remains bounded above by a constant. As a natural corollary to our KL lower bound, we also establish a sample size requirement for correct model selection via maximum likelihood estimation. Our results rigorize the notion that it is essential to estimate the edge structure of a Gaussian graphical model accurately in order to approximate the true distribution to close precision.

View on arXiv PDF

Similar