CL AIMar 23, 2021

TMR: Evaluating NER Recall on Tough Mentions

arXiv:2103.12312v132.7803 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for more nuanced evaluation metrics in NER research, particularly for researchers and practitioners focusing on model robustness and performance on difficult cases, though it is incremental as it supplements existing metrics rather than introducing a new method.

The authors tackled the problem of evaluating named entity recognition (NER) systems by proposing Tough Mentions Recall (TMR) metrics to assess recall on challenging subsets like unseen and type-confusable mentions, and demonstrated their utility by revealing subtle performance differences between models such as BERT and Flair on English corpora and identifying weaknesses in Spanish models.

We propose the Tough Mentions Recall (TMR) metrics to supplement traditional named entity recognition (NER) evaluation by examining recall on specific subsets of "tough" mentions: unseen mentions, those whose tokens or token/type combination were not observed in training, and type-confusable mentions, token sequences with multiple entity types in the test data. We demonstrate the usefulness of these metrics by evaluating corpora of English, Spanish, and Dutch using five recent neural architectures. We identify subtle differences between the performance of BERT and Flair on two English NER corpora and identify a weak spot in the performance of current models in Spanish. We conclude that the TMR metrics enable differentiation between otherwise similar-scoring systems and identification of patterns in performance that would go unnoticed from overall precision, recall, and F1.

View on arXiv PDF

Similar