CLLGMay 19, 2022

Why only Micro-F1? Class Weighting of Measures for Relation Classification

arXiv:2205.09460v1642 citationsh-index: 15
Originality Synthesis-oriented
AI Analysis

This work addresses evaluation challenges for researchers in relation classification, but it is incremental as it builds on existing weighting schemes.

The paper tackles the problem of evaluating relation classification models on imbalanced datasets by analyzing weighting schemes like micro-F1 and macro-F1, introducing a framework with new intermediate schemes, and showing that reporting multiple schemes better highlights model strengths and weaknesses.

Relation classification models are conventionally evaluated using only a single measure, e.g., micro-F1, macro-F1 or AUC. In this work, we analyze weighting schemes, such as micro and macro, for imbalanced datasets. We introduce a framework for weighting schemes, where existing schemes are extremes, and two new intermediate schemes. We show that reporting results of different weighting schemes better highlights strengths and weaknesses of a model.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes