Scalable Structure Learning of Bayesian Networks by Learning Algorithm Ensembles
This work addresses scalable structure learning for Bayesian networks, which is crucial for handling large datasets in fields like bioinformatics and machine learning, representing a strong incremental advance over existing divide-and-conquer methods.
The paper tackles the challenge of unstable accuracy in scalable Bayesian network structure learning by introducing an automatic ensemble method (Auto-SLE) that combines multiple algorithms, achieving accuracy improvements of 30% to 225% on datasets with up to 10,000 variables and generalizing to larger datasets.
Learning the structure of Bayesian networks (BNs) from data is challenging, especially for datasets involving a large number of variables. The recently proposed divide-and-conquer (D\&D) strategies present a promising approach for learning large BNs. However, they still face a main issue of unstable learning accuracy across subproblems. In this work, we introduce the idea of employing structure learning ensemble (SLE), which combines multiple BN structure learning algorithms, to consistently achieve high learning accuracy. We further propose an automatic approach called Auto-SLE for learning near-optimal SLEs, addressing the challenge of manually designing high-quality SLEs. The learned SLE is then integrated into a D\&D method. Extensive experiments firmly show the superiority of our method over D\&D methods with single BN structure learning algorithm in learning large BNs, achieving accuracy improvement usually by 30\%$\sim$225\% on datasets involving 10,000 variables. Furthermore, our method generalizes well to datasets with many more (e.g., 30000) variables and different network characteristics than those present in the training data for learning the SLE. These results indicate the significant potential of employing (automatic learning of) SLEs for scalable BN structure learning.