CL AI LGJun 2, 2021

Posthoc Verification and the Fallibility of the Ground Truth

Yifan Ding, Nicholas Botzer, Tim Weninger

arXiv:2106.07353v130.1626 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses evaluation validity issues for researchers and practitioners in NLP, particularly in entity linking, but is incremental as it adapts existing verification approaches.

The paper tackles the problem of noisy ground truth labels and strict evaluation metrics in entity linking by introducing a posthoc verification method, finding that state-of-the-art models performed extremely well and sometimes outperformed the ground truth in verification rates.

Classifiers commonly make use of pre-annotated datasets, wherein a model is evaluated by pre-defined metrics on a held-out test set typically made of human-annotated labels. Metrics used in these evaluations are tied to the availability of well-defined ground truth labels, and these metrics typically do not allow for inexact matches. These noisy ground truth labels and strict evaluation metrics may compromise the validity and realism of evaluation results. In the present work, we discuss these concerns and conduct a systematic posthoc verification experiment on the entity linking (EL) task. Unlike traditional methodologies, which asks annotators to provide free-form annotations, we ask annotators to verify the correctness of annotations after the fact (i.e., posthoc). Compared to pre-annotation evaluation, state-of-the-art EL models performed extremely well according to the posthoc evaluation methodology. Posthoc validation also permits the validation of the ground truth dataset. Surprisingly, we find predictions from EL models had a similar or higher verification rate than the ground truth. We conclude with a discussion on these findings and recommendations for future evaluations.

View on arXiv PDF Code

Similar