IR CL LG NE CO MLMar 22, 2015

What the F-measure doesn't measure: Features, Flaws, Fallacies and Fixes

arXiv:1503.06410v218.6103 citations

Originality Incremental advance

AI Analysis

This addresses a critical issue for researchers and practitioners in IR, NLP, and ML by highlighting and correcting a widely used but flawed metric.

The paper identifies fundamental flaws in the F-measure, arguing it is based on a mistake and unsuitable for most contexts, and proposes better alternatives as solutions.

The F-measure or F-score is one of the most commonly used single number measures in Information Retrieval, Natural Language Processing and Machine Learning, but it is based on a mistake, and the flawed assumptions render it unsuitable for use in most contexts! Fortunately, there are better alternatives.

View on arXiv PDF

Similar