LG MLJan 25, 2019

Faster Boosting with Smaller Memory

arXiv:1901.09047v35.411 citationsh-index: 47Has Code

Originality Incremental advance

AI Analysis

This addresses memory constraints in boosting for large datasets, offering a practical solution for users with limited hardware, though it is incremental as it builds on existing methods.

The paper tackles the problem of boosting algorithms requiring large memory to achieve high speed, presenting an alternative approach that uses early stopping, effective sample size, and stratified sampling to achieve a 10-100x speedup over XGBoost when training data is too large to fit in memory.

State-of-the-art implementations of boosting, such as XGBoost and LightGBM, can process large training sets extremely fast. However, this performance requires that the memory size is sufficient to hold a 2-3 multiple of the training set size. This paper presents an alternative approach to implementing the boosted trees, which achieves a significant speedup over XGBoost and LightGBM, especially when the memory size is small. This is achieved using a combination of three techniques: early stopping, effective sample size, and stratified sampling. Our experiments demonstrate a 10-100 speedup over XGBoost when the training data is too large to fit in memory.

View on arXiv PDF Code

Similar