DSDMLGMay 20, 2021

On the Parameterized Complexity of Polytree Learning

arXiv:2105.09675v19 citations
Originality Incremental advance
AI Analysis

This addresses a theoretical bottleneck for data scientists working with polytree-structured Bayesian networks, providing new complexity results that are incremental to existing knowledge.

The paper tackles the computational complexity of learning optimal Bayesian networks that are polytrees, showing it can be solved in 3^n * |I|^{O(1)} time, but lacks efficient algorithms when parameterized by the number of variables with nonempty parent sets, unlike general Bayesian network learning.

A Bayesian network is a directed acyclic graph that represents statistical dependencies between variables of a joint probability distribution. A fundamental task in data science is to learn a Bayesian network from observed data. \textsc{Polytree Learning} is the problem of learning an optimal Bayesian network that fulfills the additional property that its underlying undirected graph is a forest. In this work, we revisit the complexity of \textsc{Polytree Learning}. We show that \textsc{Polytree Learning} can be solved in $3^n \cdot |I|^{\mathcal{O}(1)}$ time where $n$ is the number of variables and $|I|$ is the total instance size. Moreover, we consider the influence of the number of variables $d$ that might receive a nonempty parent set in the final DAG on the complexity of \textsc{Polytree Learning}. We show that \textsc{Polytree Learning} has no $f(d)\cdot |I|^{\mathcal{O}(1)}$-time algorithm, unlike Bayesian network learning which can be solved in $2^d \cdot |I|^{\mathcal{O}(1)}$ time. We show that, in contrast, if $d$ and the maximum parent set size are bounded, then we can obtain efficient algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes