ML LGAug 29, 2022

Approach of variable clustering and compression for learning large Bayesian networks

arXiv:2208.13605v12.1

Originality Incremental advance

AI Analysis

This work addresses scalability issues in Bayesian network learning for data scientists, though it is incremental as it builds on existing methods like Hill-Climbing.

The paper tackles the problem of learning large Bayesian network structures by introducing a method that clusters features using normalized mutual information and compresses block information for classical learning algorithms, achieving improvements in both speed and accuracy.

This paper describes a new approach for learning structures of large Bayesian networks based on blocks resulting from feature space clustering. This clustering is obtained using normalized mutual information. And the subsequent aggregation of blocks is done using classical learning methods except that they are input with compressed information about combinations of feature values for each block. Validation of this approach is done for Hill-Climbing as a graph enumeration algorithm for two score functions: BIC and MI. In this way, potentially parallelizable block learning can be implemented even for those score functions that are considered unsuitable for parallelizable learning. The advantage of the approach is evaluated in terms of speed of work as well as the accuracy of the found structures.

View on arXiv PDF

Similar