Robust performance metrics for imbalanced classification problems
This addresses a critical issue in imbalanced classification problems, such as fraud detection or medical diagnosis, by providing more reliable metrics, though it is incremental as it modifies existing metrics rather than proposing a new paradigm.
The paper identifies that standard classification metrics like F-score and MCC are not robust to class imbalance, causing the true positive rate to approach zero for minority classes, and introduces modified versions that maintain bounded true positive rates even in highly imbalanced settings, with numerical validation on simulations and a credit default dataset.
We show that established performance metrics in binary classification, such as the F-score, the Jaccard similarity coefficient or Matthews' correlation coefficient (MCC), are not robust to class imbalance in the sense that if the proportion of the minority class tends to $0$, the true positive rate (TPR) of the Bayes classifier under these metrics tends to $0$ as well. Thus, in imbalanced classification problems, these metrics favour classifiers which ignore the minority class. To alleviate this issue we introduce robust modifications of the F-score and the MCC for which, even in strongly imbalanced settings, the TPR is bounded away from $0$. We numerically illustrate the behaviour of the various performance metrics in simulations as well as on a credit default data set. We also discuss connections to the ROC and precision-recall curves and give recommendations on how to combine their usage with performance metrics.