CLNov 11, 2016

Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure

arXiv:1611.03641v222 citations
Originality Synthesis-oriented
AI Analysis

This work addresses reliability issues in word similarity evaluation for NLP researchers, but it is incremental as it builds on existing dataset and measure frameworks.

The authors tackled the problem of unreliable word similarity evaluation by redesigning the annotation task to increase inter-rater agreement and defining a performance measure that accounts for annotation reliability, resulting in improved evaluation metrics.

We suggest a new method for creating and using gold-standard datasets for word similarity evaluation. Our goal is to improve the reliability of the evaluation, and we do this by redesigning the annotation task to achieve higher inter-rater agreement, and by defining a performance measure which takes the reliability of each annotation decision in the dataset into account.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes