Hajo Holzmann

h-index18
2papers

2 Papers

MLApr 11, 2024
Robust performance metrics for imbalanced classification problems

Hajo Holzmann, Bernhard Klar

We show that established performance metrics in binary classification, such as the F-score, the Jaccard similarity coefficient or Matthews' correlation coefficient (MCC), are not robust to class imbalance in the sense that if the proportion of the minority class tends to $0$, the true positive rate (TPR) of the Bayes classifier under these metrics tends to $0$ as well. Thus, in imbalanced classification problems, these metrics favour classifiers which ignore the minority class. To alleviate this issue we introduce robust modifications of the F-score and the MCC for which, even in strongly imbalanced settings, the TPR is bounded away from $0$. We numerically illustrate the behaviour of the various performance metrics in simulations as well as on a credit default data set. We also discuss connections to the ROC and precision-recall curves and give recommendations on how to combine their usage with performance metrics.

STNov 3, 2020
Support estimation in high-dimensional heteroscedastic mean regression

Philipp Hermann, Hajo Holzmann

A current strand of research in high-dimensional statistics deals with robustifying the available methodology with respect to deviations from the pervasive light-tail assumptions. In this paper we consider a linear mean regression model with random design and potentially heteroscedastic, heavy-tailed errors, and investigate support estimation in this framework. We use a strictly convex, smooth variant of the Huber loss function with tuning parameter depending on the parameters of the problem, as well as the adaptive LASSO penalty for computational efficiency. For the resulting estimator we show sign-consistency and optimal rates of convergence in the $\ell_\infty$ norm as in the homoscedastic, light-tailed setting. In our analysis, we have to deal with the issue that the support of the target parameter in the linear mean regression model and its robustified version may differ substantially even for small values of the tuning parameter of the Huber loss function. Simulations illustrate the favorable numerical performance of the proposed methodology.