CL LGMay 19, 2022

Why only Micro-F1? Class Weighting of Measures for Relation Classification

David Harbecke, Yuxuan Chen, Leonhard Hennig, Christoph Alt

arXiv:2205.09460v132.0642 citationsh-index: 15Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses evaluation challenges for researchers in relation classification, but it is incremental as it builds on existing weighting schemes.

The paper tackles the problem of evaluating relation classification models on imbalanced datasets by analyzing weighting schemes like micro-F1 and macro-F1, introducing a framework with new intermediate schemes, and showing that reporting multiple schemes better highlights model strengths and weaknesses.

Relation classification models are conventionally evaluated using only a single measure, e.g., micro-F1, macro-F1 or AUC. In this work, we analyze weighting schemes, such as micro and macro, for imbalanced datasets. We introduce a framework for weighting schemes, where existing schemes are extremes, and two new intermediate schemes. We show that reporting results of different weighting schemes better highlights strengths and weaknesses of a model.

View on arXiv PDF Code

Similar