LGDCMLApr 12, 2018

Asynch-SGBDT: Asynchronous Parallel Stochastic Gradient Boosting Decision Tree based on Parameters Server

arXiv:1804.04659v48 citations
Originality Incremental advance
AI Analysis

This work addresses the computational bottleneck in GBDT training for AI researchers and industry practitioners, but it is incremental as it builds on existing parameter server frameworks with asynchronous parallelism.

The paper tackles the problem of slow training times for Gradient Boosting Decision Trees (GBDT) by proposing an asynchronous parallel method called asynch-SGBDT, which achieves linear speedup under certain conditions like high dataset diversity and appropriate parameter settings.

In AI research and industry, machine learning is the most widely used tool. One of the most important machine learning algorithms is Gradient Boosting Decision Tree, i.e. GBDT whose training process needs considerable computational resources and time. To shorten GBDT training time, many works tried to apply GBDT on Parameter Server. However, those GBDT algorithms are synchronous parallel algorithms which fail to make full use of Parameter Server. In this paper, we examine the possibility of using asynchronous parallel methods to train GBDT model and name this algorithm as asynch-SGBDT (asynchronous parallel stochastic gradient boosting decision tree). Our theoretical and experimental results indicate that the scalability of asynch-SGBDT is influenced by the sample diversity of datasets, sampling rate, step length and the setting of GBDT tree. Experimental results also show asynch-SGBDT training process reaches a linear speedup in asynchronous parallel manner when datasets and GBDT trees meet high scalability requirements.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes