CLOct 13, 2017

Complex Word Identification: Challenges in Data Annotation and System Performance

arXiv:1710.04989v11095 citations
Originality Synthesis-oriented
AI Analysis

This addresses challenges in natural language processing for tasks like text simplification, but it is incremental as it builds on prior shared task work.

The paper investigates the problem of complex word identification (CWI) by analyzing system performance on the SemEval CWI dataset, finding that most systems performed poorly, partly due to issues in human annotation methods.

This paper revisits the problem of complex word identification (CWI) following up the SemEval CWI shared task. We use ensemble classifiers to investigate how well computational methods can discriminate between complex and non-complex words. Furthermore, we analyze the classification performance to understand what makes lexical complexity challenging. Our findings show that most systems performed poorly on the SemEval CWI dataset, and one of the reasons for that is the way in which human annotation was performed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes