IRCLLGNECOMLMar 22, 2015

What the F-measure doesn't measure: Features, Flaws, Fallacies and Fixes

arXiv:1503.06410v2103 citations
Originality Incremental advance
AI Analysis

This addresses a critical issue for researchers and practitioners in IR, NLP, and ML by highlighting and correcting a widely used but flawed metric.

The paper identifies fundamental flaws in the F-measure, arguing it is based on a mistake and unsuitable for most contexts, and proposes better alternatives as solutions.

The F-measure or F-score is one of the most commonly used single number measures in Information Retrieval, Natural Language Processing and Machine Learning, but it is based on a mistake, and the flawed assumptions render it unsuitable for use in most contexts! Fortunately, there are better alternatives.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes