Robust Boosting for Regression Problems
This work addresses robust regression for applications with many explanatory variables, offering a scalable and efficient method that is incremental by building on existing robust linear regression techniques.
The paper tackles robust regression in the presence of outliers by proposing a two-stage robust boosting algorithm that minimizes a robust residual scale estimator and optimizes a bounded loss function, showing it performs as well as standard gradient boosting without outliers and outperforms alternatives with outliers in prediction error and variable selection accuracy.
Gradient boosting algorithms construct a regression predictor using a linear combination of ``base learners''. Boosting also offers an approach to obtaining robust non-parametric regression estimators that are scalable to applications with many explanatory variables. The robust boosting algorithm is based on a two-stage approach, similar to what is done for robust linear regression: it first minimizes a robust residual scale estimator, and then improves it by optimizing a bounded loss function. Unlike previous robust boosting proposals this approach does not require computing an ad-hoc residual scale estimator in each boosting iteration. Since the loss functions involved in this robust boosting algorithm are typically non-convex, a reliable initialization step is required, such as an L1 regression tree, which is also fast to compute. A robust variable importance measure can also be calculated via a permutation procedure. Thorough simulation studies and several data analyses show that, when no atypical observations are present, the robust boosting approach works as well as the standard gradient boosting with a squared loss. Furthermore, when the data contain outliers, the robust boosting estimator outperforms the alternatives in terms of prediction error and variable selection accuracy.