Learning Bayesian Networks with Incomplete Data by Augmentation
This addresses a key problem in probabilistic modeling for domains with missing data, though it appears incremental as it builds on existing structure learning methods.
The paper tackles learning Bayesian networks from incomplete data by introducing an exact algorithm via data augmentation, which is the first of its kind, and an approximate hill-climbing algorithm for scalability, with experiments showing benefits.
We present new algorithms for learning Bayesian networks from data with missing values using a data augmentation approach. An exact Bayesian network learning algorithm is obtained by recasting the problem into a standard Bayesian network learning problem without missing data. To the best of our knowledge, this is the first exact algorithm for this problem. As expected, the exact algorithm does not scale to large domains. We build on the exact method to create an approximate algorithm using a hill-climbing technique. This algorithm scales to large domains so long as a suitable standard structure learning method for complete data is available. We perform a wide range of experiments to demonstrate the benefits of learning Bayesian networks with such new approach.