DSLGSep 17, 2019

Communication-Efficient Weighted Sampling and Quantile Summary for GBDT

arXiv:1909.07633v1
Originality Incremental advance
AI Analysis

This addresses communication bottlenecks in distributed GBDT training, which is crucial for large-scale machine learning applications, though it appears incremental as it builds on existing approximate tree learning techniques.

The paper tackles communication overhead in distributed training of gradient boosting decision trees (GBDT) by proposing two novel communication-efficient methods: a weighted sampling approach for estimating information gain on small subsets and distributed protocols for weighted quantile problems, achieving state-of-the-art performance in handling massive data.

Gradient boosting decision tree (GBDT) is a powerful and widely-used machine learning model, which has achieved state-of-the-art performance in many academic areas and production environment. However, communication overhead is the main bottleneck in distributed training which can handle the massive data nowadays. In this paper, we propose two novel communication-efficient methods over distributed dataset to mitigate this problem, a weighted sampling approach by which we can estimate the information gain over a small subset efficiently, and distributed protocols for weighted quantile problem used in approximate tree learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes