LGMLJan 25, 2019

Faster Boosting with Smaller Memory

arXiv:1901.09047v311 citations
Originality Incremental advance
AI Analysis

This addresses memory constraints in boosting for large datasets, offering a practical solution for users with limited hardware, though it is incremental as it builds on existing methods.

The paper tackles the problem of boosting algorithms requiring large memory to achieve high speed, presenting an alternative approach that uses early stopping, effective sample size, and stratified sampling to achieve a 10-100x speedup over XGBoost when training data is too large to fit in memory.

State-of-the-art implementations of boosting, such as XGBoost and LightGBM, can process large training sets extremely fast. However, this performance requires that the memory size is sufficient to hold a 2-3 multiple of the training set size. This paper presents an alternative approach to implementing the boosted trees, which achieves a significant speedup over XGBoost and LightGBM, especially when the memory size is small. This is achieved using a combination of three techniques: early stopping, effective sample size, and stratified sampling. Our experiments demonstrate a 10-100 speedup over XGBoost when the training data is too large to fit in memory.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes