Efficient Distance Approximation for Structured High-Dimensional Distributions via Learning
This addresses the challenge of comparing complex distributions in machine learning and statistics, offering incremental improvements for specific structured models.
The paper tackles the problem of efficiently approximating distances between structured high-dimensional distributions, such as Bayesian networks and Ising models, by providing algorithms that achieve additive error ε using polynomial samples and time. It results in the first efficient distance approximation algorithms for these problems, enabling new tolerant testing methods.
We design efficient distance approximation algorithms for several classes of structured high-dimensional distributions. Specifically, we show algorithms for the following problems: - Given sample access to two Bayesian networks $P_1$ and $P_2$ over known directed acyclic graphs $G_1$ and $G_2$ having $n$ nodes and bounded in-degree, approximate $d_{tv}(P_1,P_2)$ to within additive error $ε$ using $poly(n,ε)$ samples and time - Given sample access to two ferromagnetic Ising models $P_1$ and $P_2$ on $n$ variables with bounded width, approximate $d_{tv}(P_1, P_2)$ to within additive error $ε$ using $poly(n,ε)$ samples and time - Given sample access to two $n$-dimensional Gaussians $P_1$ and $P_2$, approximate $d_{tv}(P_1, P_2)$ to within additive error $ε$ using $poly(n,ε)$ samples and time - Given access to observations from two causal models $P$ and $Q$ on $n$ variables that are defined over known causal graphs, approximate $d_{tv}(P_a, Q_a)$ to within additive error $ε$ using $poly(n,ε)$ samples, where $P_a$ and $Q_a$ are the interventional distributions obtained by the intervention $do(A=a)$ on $P$ and $Q$ respectively for a particular variable $A$. Our results are the first efficient distance approximation algorithms for these well-studied problems. They are derived using a simple and general connection to distribution learning algorithms. The distance approximation algorithms imply new efficient algorithms for {\em tolerant} testing of closeness of the above-mentioned structured high-dimensional distributions.