COJan 22, 2023
Parallel Approaches to Accelerate Bayesian Decision TreesEfthyvoulos Drousiotis, Paul G. Spirakis, Simon Maskell
Markov Chain Monte Carlo (MCMC) is a well-established family of algorithms primarily used in Bayesian statistics to sample from a target distribution when direct sampling is challenging. Existing work on Bayesian decision trees uses MCMC. Unfortunately, this can be slow, especially when considering large volumes of data. It is hard to parallelise the accept-reject component of the MCMC. None-the-less, we propose two methods for exploiting parallelism in the MCMC: in the first, we replace the MCMC with another numerical Bayesian approach, the Sequential Monte Carlo (SMC) sampler, which has the appealing property that it is an inherently parallel algorithm; in the second, we consider data partitioning. Both methods use multi-core processing with a HighPerformance Computing (HPC) resource. We test the two methods in various study settings to determine which method is the most beneficial for each test case. Experiments show that data partitioning has limited utility in the settings we consider and that the use of the SMC sampler can improve run-time (compared to the sequential implementation) by up to a factor of 343.
AIJul 26, 2022
Single MCMC Chain Parallelisation on Decision TreesEfthyvoulos Drousiotis, Paul G. Spirakis
Decision trees are highly famous in machine learning and usually acquire state-of-the-art performance. Despite that, well-known variants like CART, ID3, random forest, and boosted trees miss a probabilistic version that encodes prior assumptions about tree structures and shares statistical strength between node parameters. Existing work on Bayesian decision trees depend on Markov Chain Monte Carlo (MCMC), which can be computationally slow, especially on high dimensional data and expensive proposals. In this study, we propose a method to parallelise a single MCMC decision tree chain on an average laptop or personal computer that enables us to reduce its run-time through multi-core processing while the results are statistically identical to conventional sequential implementation. We also calculate the theoretical and practical reduction in run time, which can be obtained utilising our method on multi-processor architectures. Experiments showed that we could achieve 18 times faster running time provided that the serial and the parallel implementation are statistically identical.
9.0GTMar 11
Instant Runoff Voting on Graphs: Exclusion Zones and DistortionGeorgios Birmpas, Georgios Chionas, Efthyvoulos Drousiotis et al.
We study instant-runoff voting (IRV) under metric preferences induced by an unweighted graph where each vertex hosts a voter, candidates occupy some vertices (with a single candidate allowed in such a vertex), and voters rank candidates by shortest-path distance with fixed deterministic tie-breaking. We focus on exclusion zones, vertex sets S such that whenever some candidate lies in S, the IRV winner must also lie in S. While testing whether a given set S is an exclusion zone is co-NP-Complete and finding the minimum exclusion zone is NP-hard in general graphs, we show here that both problems can be solved in polynomial time on trees. Our approach solves zone testing by designing a Kill membership test (can a designated candidate be forced to lose using opponents from a restricted set?) and shows that Kill can be decided in polynomial time on trees via a bottom-up dynamic program that certifies whether the designated candidate can be eliminated in round 1. A greedy shrinking process then recovers the minimum zone under a standard nesting assumption. To clarify the limits of tractability beyond trees, we also identify a rule level property (Strong Forced Elimination) that abstracts the key IRV behavior used in prior reductions, and show that both exclusion-zone verification and minimum- zone computation remain co-NP-complete and NP-hard, respectively, for any deterministic rank-based elimination rule satisfying this property. Finally, we relate IRV to utilitarian distortion in this discrete setting, and we present upper and lower bounds with regard to the distortion of IRV for several scenarios, including perfect binary trees and unweighted graphs.
LGMay 30, 2023
Bayesian Decision Trees Inspired from Evolutionary AlgorithmsEfthyvoulos Drousiotis, Alexander M. Phillips, Paul G. Spirakis et al.
Bayesian Decision Trees (DTs) are generally considered a more advanced and accurate model than a regular Decision Tree (DT) because they can handle complex and uncertain data. Existing work on Bayesian DTs uses Markov Chain Monte Carlo (MCMC) with an accept-reject mechanism and sample using naive proposals to proceed to the next iteration, which can be slow because of the burn-in time needed. We can reduce the burn-in period by proposing a more sophisticated way of sampling or by designing a different numerical Bayesian approach. In this paper, we propose a replacement of the MCMC with an inherently parallel algorithm, the Sequential Monte Carlo (SMC), and a more effective sampling strategy inspired by the Evolutionary Algorithms (EA). Experiments show that SMC combined with the EA can produce more accurate results compared to MCMC in 100 times fewer iterations.