LGMLJul 3, 2020

The Effect of Class Imbalance on Precision-Recall Curves

arXiv:2007.01905v356 citations
Originality Synthesis-oriented
AI Analysis

This addresses a fundamental issue in machine learning evaluation for practitioners dealing with imbalanced datasets, but it is incremental as it builds on existing theoretical work.

The study analyzes how class imbalance affects precision-recall curves by deriving the relationship between precision and the ratio of positive to negative cases, enabling predictions of changes in these curves and related metrics like Fβ.

In this note I study how the precision of a classifier depends on the ratio $r$ of positive to negative cases in the test set, as well as the classifier's true and false positive rates. This relationship allows prediction of how the precision-recall curve will change with $r$, which seems not to be well known. It also allows prediction of how $F_β$ and the Precision Gain and Recall Gain measures of Flach and Kull (2015) vary with $r$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes