Revisiting Differentiable Structure Learning: Inconsistency of $\ell_1$ Penalty and Beyond
This addresses a fundamental limitation in differentiable structure learning for causal discovery, particularly for learning Markov equivalence classes, though it is incremental relative to prior work.
The paper tackles the problem of inconsistent structure learning in differentiable methods when using ℓ₁ penalty, showing it fails even at global optimum for linear Gaussian cases. They propose a hybrid method with ℓ₀ penalty and moral graph estimation, which improves empirical performance before and after data standardization.
Recent advances in differentiable structure learning have framed the combinatorial problem of learning directed acyclic graphs as a continuous optimization problem. Various aspects, including data standardization, have been studied to identify factors that influence the empirical performance of these methods. In this work, we investigate critical limitations in differentiable structure learning methods, focusing on settings where the true structure can be identified up to Markov equivalence classes, particularly in the linear Gaussian case. While Ng et al. (2024) highlighted potential non-convexity issues in this setting, we demonstrate and explain why the use of $\ell_1$-penalized likelihood in such cases is fundamentally inconsistent, even if the global optimum of the optimization problem can be found. To resolve this limitation, we develop a hybrid differentiable structure learning method based on $\ell_0$-penalized likelihood with hard acyclicity constraint, where the $\ell_0$ penalty can be approximated by different techniques including Gumbel-Softmax. Specifically, we first estimate the underlying moral graph, and use it to restrict the search space of the optimization problem, which helps alleviate the non-convexity issue. Experimental results show that the proposed method enhances empirical performance both before and after data standardization, providing a more reliable path for future advancements in differentiable structure learning, especially for learning Markov equivalence classes.