Benchmarking state-of-the-art gradient boosting algorithms for classification
This provides practical guidance for practitioners choosing gradient boosting algorithms, though it is incremental as it benchmarks existing methods.
This study benchmarked four popular gradient boosting implementations (GBM, XGBoost, LightGBM, CatBoost) on diverse real-world datasets, comparing hyperparameter optimization strategies and evaluating performance in terms of accuracy, runtime, and tuning time. The results identified the variant offering the best balance between effectiveness, reliability, and ease of use.
This work explores the use of gradient boosting in the context of classification. Four popular implementations, including original GBM algorithm and selected state-of-the-art gradient boosting frameworks (i.e. XGBoost, LightGBM and CatBoost), have been thoroughly compared on several publicly available real-world datasets of sufficient diversity. In the study, special emphasis was placed on hyperparameter optimization, specifically comparing two tuning strategies, i.e. randomized search and Bayesian optimization using the Tree-stuctured Parzen Estimator. The performance of considered methods was investigated in terms of common classification accuracy metrics as well as runtime and tuning time. Additionally, obtained results have been validated using appropriate statistical testing. An attempt was made to indicate a gradient boosting variant showing the right balance between effectiveness, reliability and ease of use.