Learning Bayesian Networks Under Sparsity Constraints: A Parameterized Complexity Analysis
This work addresses computational complexity challenges in Bayesian network learning for researchers in machine learning and algorithms, providing theoretical insights into tractability under sparsity, but it is incremental as it builds on prior parameterized complexity analyses.
The paper tackles the problem of learning optimal Bayesian network structures under sparsity constraints, showing that polynomial-time algorithms exist for certain constant-distance cases (e.g., vertex deletion distance to graphs with max degree 1), while proving NP-hardness for others (e.g., max degree 2 or connected components of size at least 3) and establishing parameterized complexity results (e.g., no FPT algorithm for moralized graph edge constraints but an FPT algorithm for arc constraints).
We study the problem of learning the structure of an optimal Bayesian network when additional constraints are posed on the network or on its moralized graph. More precisely, we consider the constraint that the network or its moralized graph are close, in terms of vertex or edge deletions, to a sparse graph class $Π$. For example, we show that learning an optimal network whose moralized graph has vertex deletion distance at most $k$ from a graph with maximum degree 1 can be computed in polynomial time when $k$ is constant. This extends previous work that gave an algorithm with such a running time for the vertex deletion distance to edgeless graphs [Korhonen & Parviainen, NIPS 2015]. We then show that further extensions or improvements are presumably impossible. For example, we show that learning optimal networks where the network or its moralized graph have maximum degree $2$ or connected components of size at most $c$, $c\ge 3$, is NP-hard. Finally, we show that learning an optimal network with at most $k$ edges in the moralized graph presumably has no $f(k)\cdot |I|^{O(1)}$-time algorithm and that, in contrast, an optimal network with at most $k$ arcs can be computed in $2^{O(k)}\cdot |I|^{O(1)}$ time where $|I|$ is the total input size.