Recoverability Landscape of Tree Structured Markov Random Fields under Symmetric Noise
This addresses a fundamental limitation in graphical model learning for discrete variables with noise, offering theoretical insights and practical algorithms for researchers in machine learning and statistics.
The paper tackles the problem of learning tree-structured Markov random fields from observations corrupted by symmetric noise, showing that for support sizes of 3 or more, partial or full identifiability of leaf clusters is possible, and provides necessary and sufficient conditions for exact recoverability along with a polynomial-time algorithm.
We study the problem of learning tree-structured Markov random fields (MRF) on discrete random variables with common support when the observations are corrupted by a $k$-ary symmetric noise channel with unknown probability of error. For Ising models (support size = 2), past work has shown that graph structure can only be recovered up to the leaf clusters (a leaf node, its parent, and its siblings form a leaf cluster) and exact recovery is impossible. No prior work has addressed the setting of support size of 3 or more, and indeed this setting is far richer. As we show, when the support size is 3 or more, the structure of the leaf clusters may be partially or fully identifiable. We provide a precise characterization of this phenomenon and show that the extent of recoverability is dictated by the joint PMF of the random variables. In particular, we provide necessary and sufficient conditions for exact recoverability. Furthermore, we present a polynomial time, sample efficient algorithm that recovers the exact tree when this is possible, or up to the unidentifiability as promised by our characterization, when full recoverability is impossible. Finally, we demonstrate the efficacy of our algorithm experimentally.