DAGs with No Fears: A Closer Look at Continuous Optimization for Learning Bayesian Networks
This work addresses the problem of learning Bayesian networks more accurately and efficiently for researchers and practitioners in machine learning, though it is incremental as it builds on the existing NOTEARS framework.
The paper re-examines the NOTEARS continuous optimization framework for learning Bayesian networks, showing that its original KKT conditions cannot be satisfied except trivially, and proposes a local search post-processing algorithm that improves structural Hamming distance by a factor of 2 or more for all tested algorithms.
This paper re-examines a continuous optimization framework dubbed NOTEARS for learning Bayesian networks. We first generalize existing algebraic characterizations of acyclicity to a class of matrix polynomials. Next, focusing on a one-parameter-per-edge setting, it is shown that the Karush-Kuhn-Tucker (KKT) optimality conditions for the NOTEARS formulation cannot be satisfied except in a trivial case, which explains a behavior of the associated algorithm. We then derive the KKT conditions for an equivalent reformulation, show that they are indeed necessary, and relate them to explicit constraints that certain edges be absent from the graph. If the score function is convex, these KKT conditions are also sufficient for local minimality despite the non-convexity of the constraint. Informed by the KKT conditions, a local search post-processing algorithm is proposed and shown to substantially and universally improve the structural Hamming distance of all tested algorithms, typically by a factor of 2 or more. Some combinations with local search are both more accurate and more efficient than the original NOTEARS.