Infinitesimal gradient boosting
This work provides a theoretical foundation for gradient boosting, which is incremental but clarifies its asymptotic behavior for machine learning researchers.
The authors tackled the problem of understanding gradient boosting in the vanishing learning rate limit by introducing infinitesimal gradient boosting, which converges to a unique solution of a nonlinear ODE in an infinite-dimensional function space, with results showing controlled total variation and centered residuals.
We define infinitesimal gradient boosting as a limit of the popular tree-based gradient boosting algorithm from machine learning. The limit is considered in the vanishing-learning-rate asymptotic, that is when the learning rate tends to zero and the number of gradient trees is rescaled accordingly. For this purpose, we introduce a new class of randomized regression trees bridging totally randomized trees and Extra Trees and using a softmax distribution for binary splitting. Our main result is the convergence of the associated stochastic algorithm and the characterization of the limiting procedure as the unique solution of a nonlinear ordinary differential equation in a infinite dimensional function space. Infinitesimal gradient boosting defines a smooth path in the space of continuous functions along which the training error decreases, the residuals remain centered and the total variation is well controlled.