LGNov 12, 2021

A Simple and Fast Baseline for Tuning Large XGBoost Models

arXiv:2111.06924v1
Originality Synthesis-oriented
AI Analysis

This is an incremental improvement for practitioners needing efficient tuning of large-scale tabular data models.

The paper tackles the problem of time-consuming hyperparameter tuning for large XGBoost models by proposing uniform subsampling as a fast baseline, demonstrating its effectiveness on datasets ranging from 15-70GB in size.

XGBoost, a scalable tree boosting algorithm, has proven effective for many prediction tasks of practical interest, especially using tabular datasets. Hyperparameter tuning can further improve the predictive performance, but unlike neural networks, full-batch training of many models on large datasets can be time consuming. Owing to the discovery that (i) there is a strong linear relation between dataset size & training time, (ii) XGBoost models satisfy the ranking hypothesis, and (iii) lower-fidelity models can discover promising hyperparameter configurations, we show that uniform subsampling makes for a simple yet fast baseline to speed up the tuning of large XGBoost models using multi-fidelity hyperparameter optimization with data subsets as the fidelity dimension. We demonstrate the effectiveness of this baseline on large-scale tabular datasets ranging from $15-70\mathrm{GB}$ in size.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes