LGMEMLOct 11, 2020

Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation

arXiv:2010.16061v16575 citations
Originality Incremental advance
AI Analysis

This addresses a foundational problem in machine learning evaluation by proposing more robust metrics, though it is incremental in refining existing statistical concepts.

The paper critiques common evaluation measures like Recall, Precision, and F-Measure for being biased and misleading, and introduces Informedness and Markedness as alternatives that better reflect informed versus chance prediction probabilities, with connections to Correlation and Significance.

Commonly used evaluation measures including Recall, Precision, F-Measure and Rand Accuracy are biased and should not be used without clear understanding of the biases, and corresponding identification of chance or base case levels of the statistic. Using these measures a system that performs worse in the objective sense of Informedness, can appear to perform better under any of these commonly used measures. We discuss several concepts and measures that reflect the probability that prediction is informed versus chance. Informedness and introduce Markedness as a dual measure for the probability that prediction is marked versus chance. Finally we demonstrate elegant connections between the concepts of Informedness, Markedness, Correlation and Significance as well as their intuitive relationships with Recall and Precision, and outline the extension from the dichotomous case to the general multi-class case.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes