Statistical Robustness of Interval CVaR Based Regression Models under Perturbation and Contamination
This work addresses robustness issues in statistical learning for researchers and practitioners, providing theoretical foundations for In-CVaR in regression, though it is incremental as it builds on prior work on In-CVaR.
The paper tackles the problem of robustness in nonlinear regression under data perturbation and contamination by analyzing interval conditional value-at-risk (In-CVaR) models, showing that these models exhibit superior robustness with theoretical guarantees, including distributional breakdown points and qualitative robustness under minor assumptions.
Robustness under perturbation and contamination is a prominent issue in statistical learning. We address the robust nonlinear regression based on the so-called interval conditional value-at-risk (In-CVaR), which is introduced to enhance robustness by trimming extreme losses. While recent literature shows that the In-CVaR based statistical learning exhibits superior robustness performance than classical robust regression models, its theoretical robustness analysis for nonlinear regression remains largely unexplored. We rigorously quantify robustness under contamination, with a unified study of distributional breakdown point for a broad class of regression models, including linear, piecewise affine and neural network models with $\ell_1$, $\ell_2$ and Huber losses. Moreover, we analyze the qualitative robustness of the In-CVaR based estimator under perturbation. We show that under several minor assumptions, the In-CVaR based estimator is qualitatively robust in terms of the Prokhorov metric if and only if the largest portion of losses is trimmed. Overall, this study analyzes robustness properties of In-CVaR based nonlinear regression models under both perturbation and contamination, which illustrates the advantages of In-CVaR risk measure over conditional value-at-risk and expectation for robust regression in both theory and numerical experiments.