LG IT PR ST MLDec 9, 2016

Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing

arXiv:1612.03164v115.451 citations

Originality Incremental advance

AI Analysis

This addresses the problem of efficient identity testing for Bayesian networks, which is incremental but provides improved sample complexities for statisticians and machine learning practitioners.

The paper tackles the problem of bounding the square Hellinger distance between Bayesian networks by proving a subadditivity inequality with respect to graph neighborhoods, and applies this to derive sample complexity bounds for identity testing, achieving optimal or near-optimal rates in specific cases such as trees or product distributions.

We show that the square Hellinger distance between two Bayesian networks on the same directed graph, $G$, is subadditive with respect to the neighborhoods of $G$. Namely, if $P$ and $Q$ are the probability distributions defined by two Bayesian networks on the same DAG, our inequality states that the square Hellinger distance, $H^2(P,Q)$, between $P$ and $Q$ is upper bounded by the sum, $\sum_v H^2(P_{\{v\} \cup Π_v}, Q_{\{v\} \cup Π_v})$, of the square Hellinger distances between the marginals of $P$ and $Q$ on every node $v$ and its parents $Π_v$ in the DAG. Importantly, our bound does not involve the conditionals but the marginals of $P$ and $Q$. We derive a similar inequality for more general Markov Random Fields. As an application of our inequality, we show that distinguishing whether two Bayesian networks $P$ and $Q$ on the same (but potentially unknown) DAG satisfy $P=Q$ vs $d_{\rm TV}(P,Q)>ε$ can be performed from $\tilde{O}(|Σ|^{3/4(d+1)} \cdot n/ε^2)$ samples, where $d$ is the maximum in-degree of the DAG and $Σ$ the domain of each variable of the Bayesian networks. If $P$ and $Q$ are defined on potentially different and potentially unknown trees, the sample complexity becomes $\tilde{O}(|Σ|^{4.5} n/ε^2)$, whose dependence on $n, ε$ is optimal up to logarithmic factors. Lastly, if $P$ and $Q$ are product distributions over $\{0,1\}^n$ and $Q$ is known, the sample complexity becomes $O(\sqrt{n}/ε^2)$, which is optimal up to constant factors.

View on arXiv PDF

Similar