LGMLNov 23, 2022

SketchBoost: Fast Gradient Boosted Decision Tree for Multioutput Problems

arXiv:2211.12858v117 citationsh-index: 6
Originality Highly original
AI Analysis

This addresses a bottleneck in applying GBDT to high-dimensional multioutput tasks, offering a significant speed improvement for practitioners in data science.

The paper tackles the scalability issue of Gradient Boosted Decision Trees (GBDT) for multioutput problems by proposing SketchBoost, which accelerates training through approximate scoring methods, achieving up to over 40 times speedup with comparable or better performance.

Gradient Boosted Decision Tree (GBDT) is a widely-used machine learning algorithm that has been shown to achieve state-of-the-art results on many standard data science problems. We are interested in its application to multioutput problems when the output is highly multidimensional. Although there are highly effective GBDT implementations, their scalability to such problems is still unsatisfactory. In this paper, we propose novel methods aiming to accelerate the training process of GBDT in the multioutput scenario. The idea behind these methods lies in the approximate computation of a scoring function used to find the best split of decision trees. These methods are implemented in SketchBoost, which itself is integrated into our easily customizable Python-based GPU implementation of GBDT called Py-Boost. Our numerical study demonstrates that SketchBoost speeds up the training process of GBDT by up to over 40 times while achieving comparable or even better performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes