LGNov 21, 2022

High-Order Optimization of Gradient Boosted Decision Trees

arXiv:2211.11367v1h-index: 5
Originality Incremental advance
AI Analysis

This work addresses efficiency improvements for GBDTs, which are widely used in tabular data modeling, but it appears incremental as it builds on existing optimization methods without a paradigm shift.

The paper tackles the problem of slow convergence in Gradient Boosted Decision Trees (GBDTs) by introducing high-order optimization based on numerical optimization theory, resulting in faster per-iteration convergence and reduced running time in experiments.

Gradient Boosted Decision Trees (GBDTs) are dominant machine learning algorithms for modeling discrete or tabular data. Unlike neural networks with millions of trainable parameters, GBDTs optimize loss function in an additive manner and have a single trainable parameter per leaf, which makes it easy to apply high-order optimization of the loss function. In this paper, we introduce high-order optimization for GBDTs based on numerical optimization theory which allows us to construct trees based on high-order derivatives of a given loss function. In the experiments, we show that high-order optimization has faster per-iteration convergence that leads to reduced running time. Our solution can be easily parallelized and run on GPUs with little overhead on the code. Finally, we discuss future potential improvements such as automatic differentiation of arbitrary loss function and combination of GBDTs with neural networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes