Gradient and Newton Boosting for Classification and Regression
This work addresses a methodological gap in boosting algorithms for classification and regression, providing insights for applied data scientists and researchers, though it is incremental in nature.
The paper clarifies the distinction between gradient and Newton boosting by presenting them in a unified framework and comparing their predictive accuracy on various datasets, finding that Newton boosting outperforms gradient and hybrid boosting in most cases, with evidence that this is not due to faster convergence.
Boosting algorithms are frequently used in applied data science and in research. To date, the distinction between boosting with either gradient descent or second-order Newton updates is often not made in both applied and methodological research, and it is thus implicitly assumed that the difference is irrelevant. The goal of this article is to clarify this situation. In particular, we present gradient and Newton boosting, as well as a hybrid variant of the two, in a unified framework. We compare these boosting algorithms with trees as base learners using various datasets and loss functions. Our experiments show that Newton boosting outperforms gradient and hybrid gradient-Newton boosting in terms of predictive accuracy on the majority of datasets. We also present evidence that the reason for this is not faster convergence of Newton boosting. In addition, we introduce a novel tuning parameter for tree-based Newton boosting which is interpretable and important for predictive accuracy.