FastBDT: A speed-optimized and cache-friendly implementation of stochastic gradient-boosted decision trees for multivariate classification
This work provides a speed-optimized implementation for high energy physics applications, such as the Belle II experiment, but is incremental as it focuses on engineering optimizations rather than new algorithmic paradigms.
The paper tackled the problem of slow execution times in stochastic gradient-boosted decision trees for multivariate classification by introducing FastBDT, which achieved an order of magnitude speed improvement in both fitting and application phases compared to popular frameworks like TMVA, scikit-learn, and XGBoost.
Stochastic gradient-boosted decision trees are widely employed for multivariate classification and regression tasks. This paper presents a speed-optimized and cache-friendly implementation for multivariate classification called FastBDT. FastBDT is one order of magnitude faster during the fitting-phase and application-phase, in comparison with popular implementations in software frameworks like TMVA, scikit-learn and XGBoost. The concepts used to optimize the execution time and performance studies are discussed in detail in this paper. The key ideas include: An equal-frequency binning on the input data, which allows replacing expensive floating-point with integer operations, while at the same time increasing the quality of the classification; a cache-friendly linear access pattern to the input data, in contrast to usual implementations, which exhibit a random access pattern. FastBDT provides interfaces to C/C++, Python and TMVA. It is extensively used in the field of high energy physics by the Belle II experiment.