COAIDCLGJan 22, 2023

Parallel Approaches to Accelerate Bayesian Decision Trees

arXiv:2301.09090v14 citationsh-index: 47
Originality Incremental advance
AI Analysis

This work addresses the computational bottleneck in Bayesian decision trees for researchers and practitioners dealing with large datasets, though it is incremental as it builds on existing MCMC-based methods.

The paper tackles the slow runtime of Bayesian decision trees using MCMC by proposing two parallel methods: replacing MCMC with an inherently parallel SMC sampler and data partitioning, with experiments showing the SMC sampler improves runtime by up to a factor of 343 compared to sequential implementation.

Markov Chain Monte Carlo (MCMC) is a well-established family of algorithms primarily used in Bayesian statistics to sample from a target distribution when direct sampling is challenging. Existing work on Bayesian decision trees uses MCMC. Unfortunately, this can be slow, especially when considering large volumes of data. It is hard to parallelise the accept-reject component of the MCMC. None-the-less, we propose two methods for exploiting parallelism in the MCMC: in the first, we replace the MCMC with another numerical Bayesian approach, the Sequential Monte Carlo (SMC) sampler, which has the appealing property that it is an inherently parallel algorithm; in the second, we consider data partitioning. Both methods use multi-core processing with a HighPerformance Computing (HPC) resource. We test the two methods in various study settings to determine which method is the most beneficial for each test case. Experiments show that data partitioning has limited utility in the settings we consider and that the use of the SMC sampler can improve run-time (compared to the sequential implementation) by up to a factor of 343.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes