Learning big Gaussian Bayesian networks: partition, estimation, and fusion
This addresses the challenge of scaling Bayesian network learning to big data in various domains, representing an incremental improvement through a flexible divide-and-conquer approach.
The paper tackles the problem of structure learning for large Bayesian networks with thousands of nodes and limited samples by developing a partition-estimation-fusion (PEF) framework, which improves accuracy by 20% or more and reduces running time by up to two orders-of-magnitude compared to existing methods.
Structure learning of Bayesian networks has always been a challenging problem. Nowadays, massive-size networks with thousands or more of nodes but fewer samples frequently appear in many areas. We develop a divide-and-conquer framework, called partition-estimation-fusion (PEF), for structure learning of such big networks. The proposed method first partitions nodes into clusters, then learns a subgraph on each cluster of nodes, and finally fuses all learned subgraphs into one Bayesian network. The PEF method is designed in a flexible way so that any structure learning method may be used in the second step to learn a subgraph structure as either a DAG or a CPDAG. In the clustering step, we adapt the hierarchical clustering method to automatically choose a proper number of clusters. In the fusion step, we propose a novel hybrid method that sequentially add edges between subgraphs. Extensive numerical experiments demonstrate the competitive performance of our PEF method, in terms of both speed and accuracy compared to existing methods. Our method can improve the accuracy of structure learning by 20% or more, while reducing running time up to two orders-of-magnitude.