A Ring-Based Distributed Algorithm for Learning High-Dimensional Bayesian Networks
This addresses the computational bottleneck in learning Bayesian Networks for high-dimensional data, offering a faster distributed solution with theoretical consistency, though it is incremental as it builds directly on GES.
The paper tackles the problem of learning Bayesian Networks from high-dimensional data by proposing a distributed ring-based algorithm using Greedy Equivalence Search (GES) as the local method, which reduces CPU time while maintaining theoretical guarantees, with experiments on domains of 400-1000 variables showing effectiveness compared to GES and its fast version.
Learning Bayesian Networks (BNs) from high-dimensional data is a complex and time-consuming task. Although there are approaches based on horizontal (instances) or vertical (variables) partitioning in the literature, none can guarantee the same theoretical properties as the Greedy Equivalence Search (GES) algorithm, except those based on the GES algorithm itself. In this paper, we propose a directed ring-based distributed method that uses GES as the local learning algorithm, ensuring the same theoretical properties as GES but requiring less CPU time. The method involves partitioning the set of possible edges and constraining each processor in the ring to work only with its received subset. The global learning process is an iterative algorithm that carries out several rounds until a convergence criterion is met. In each round, each processor receives a BN from its predecessor in the ring, fuses it with its own BN model, and uses the result as the starting solution for a local learning process constrained to its set of edges. Subsequently, it sends the model obtained to its successor in the ring. Experiments were carried out on three large domains (400-1000 variables), demonstrating our proposal's effectiveness compared to GES and its fast version (fGES).