ML IT LGSep 20, 2019

Optimal Rates for Learning Hidden Tree Structures

Konstantinos E. Nikolakakis, Dionysios S. Kalogerias, Anand D. Sarwate

arXiv:1909.09596v47.77 citations

Originality Incremental advance

AI Analysis

This provides fundamental limits for structure learning in hidden tree models, which is incremental but important for statistical inference and machine learning applications.

The paper tackles the problem of learning hidden tree-structured graphical models from noisy data, showing that the Chow-Liu algorithm achieves optimal sample complexity rates inversely proportional to the information threshold squared, with matching upper and lower bounds.

We provide high probability finite sample complexity guarantees for hidden non-parametric structure learning of tree-shaped graphical models, whose hidden and observable nodes are discrete random variables with either finite or countable alphabets. We study a fundamental quantity called the (noisy) information threshold, which arises naturally from the error analysis of the Chow-Liu algorithm and, as we discuss, provides explicit necessary and sufficient conditions on sample complexity, by effectively summarizing the difficulty of the tree-structure learning problem. Specifically, we show that the finite sample complexity of the Chow-Liu algorithm for ensuring exact structure recovery from noisy data is inversely proportional to the information threshold squared (provided it is positive), and scales almost logarithmically relative to the number of nodes over a given probability of failure. Conversely, we show that, if the number of samples is less than an absolute constant times the inverse of information threshold squared, then no algorithm can recover the hidden tree structure with probability greater than one half. As a consequence, our upper and lower bounds match with respect to the information threshold, indicating that it is a fundamental quantity for the problem of learning hidden tree-structured models. Further, the Chow-Liu algorithm with noisy data as input achieves the optimal rate with respect to the information threshold. Lastly, as a byproduct of our analysis, we resolve the problem of tree structure learning in the presence of non-identically distributed observation noise, providing conditions for convergence of the Chow-Liu algorithm under this setting, as well.

View on arXiv PDF

Similar