A better Beta for the H measure of classification performance
This work provides an incremental improvement for researchers dealing with classification performance evaluation, particularly in unbalanced data scenarios.
The paper addresses the incoherence of the area under the ROC curve by proposing a modified standard distribution for the H measure, specifically the Beta(π₁+1,π₀+1) distribution, to better handle heavily unbalanced datasets.
The area under the ROC curve is widely used as a measure of performance of classification rules. However, it has recently been shown that the measure is fundamentally incoherent, in the sense that it treats the relative severities of misclassifications differently when different classifiers are used. To overcome this, Hand (2009) proposed the $H$ measure, which allows a given researcher to fix the distribution of relative severities to a classifier-independent setting on a given problem. This note extends the discussion, and proposes a modified standard distribution for the $H$ measure, which better matches the requirements of researchers, in particular those faced with heavily unbalanced datasets, the $Beta(π_1+1,π_0+1)$ distribution. [Preprint submitted at Pattern Recognition Letters]