Neural Autoregressive Flows for Markov Boundary Learning
For practitioners needing reliable and efficient Markov boundary discovery, this work offers a scalable method with theoretical support, though it is an incremental improvement over existing scoring-based approaches.
The paper proposes a framework for Markov boundary discovery using conditional entropy as a scoring criterion, implemented via a novel masked autoregressive network and a parallelizable greedy search. The method achieves superior performance and scalability on real-world and synthetic datasets.
Recovering Markov boundary -- the minimal set of variables that maximizes predictive performance for a response variable -- is crucial in many applications. While recent advances improve upon traditional constraint-based techniques by scoring local causal structures, they still rely on nonparametric estimators and heuristic searches, lacking theoretical guarantees for reliability. This paper investigates a framework for efficient Markov boundary discovery by integrating conditional entropy from information theory as a scoring criterion. We design a novel masked autoregressive network to capture complex dependencies. A parallelizable greedy search strategy in polynomial time is proposed, supported by analytical evidence. We also discuss how initializing a graph with learned Markov boundaries accelerates the convergence of causal discovery. Comprehensive evaluations on real-world and synthetic datasets demonstrate the scalability and superior performance of our method in both Markov boundary discovery and causal discovery tasks.