LGMay 18, 2023

Unbiased Gradient Boosting Decision Tree with Unbiased Feature Importance

arXiv:2305.10696v114 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses a known bottleneck in GBDT for machine learning practitioners, offering incremental improvements in performance and interpretability.

The paper tackles the bias in Gradient Boosting Decision Tree (GBDT) split finding, which causes interpretability and overfitting issues, by proposing UnbiasedGBM and unbiased gain, showing it outperforms popular GBDT implementations on 60 datasets and improves feature selection.

Gradient Boosting Decision Tree (GBDT) has achieved remarkable success in a wide variety of applications. The split finding algorithm, which determines the tree construction process, is one of the most crucial components of GBDT. However, the split finding algorithm has long been criticized for its bias towards features with a large number of potential splits. This bias introduces severe interpretability and overfitting issues in GBDT. To this end, we provide a fine-grained analysis of bias in GBDT and demonstrate that the bias originates from 1) the systematic bias in the gain estimation of each split and 2) the bias in the split finding algorithm resulting from the use of the same data to evaluate the split improvement and determine the best split. Based on the analysis, we propose unbiased gain, a new unbiased measurement of gain importance using out-of-bag samples. Moreover, we incorporate the unbiased property into the split finding algorithm and develop UnbiasedGBM to solve the overfitting issue of GBDT. We assess the performance of UnbiasedGBM and unbiased gain in a large-scale empirical study comprising 60 datasets and show that: 1) UnbiasedGBM exhibits better performance than popular GBDT implementations such as LightGBM, XGBoost, and Catboost on average on the 60 datasets and 2) unbiased gain achieves better average performance in feature selection than popular feature importance methods. The codes are available at https://github.com/ZheyuAqaZhang/UnbiasedGBM.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes