CLJun 3, 2023

GENTLE: A Genre-Diverse Multilayer Challenge Set for English NLP and Linguistic Evaluation

arXiv:2306.01966v2225 citationsh-index: 19
Originality Synthesis-oriented
AI Analysis

This provides a domain-specific challenge set for NLP evaluation, though it is incremental as it builds on existing evaluation frameworks.

The authors introduced GENTLE, a 17K-token English corpus with 8 unusual text types, to evaluate NLP systems on tasks like parsing and entity recognition, finding severe performance degradation in some genres.

We present GENTLE, a new mixed-genre English challenge corpus totaling 17K tokens and consisting of 8 unusual text types for out-of domain evaluation: dictionary entries, esports commentaries, legal documents, medical notes, poetry, mathematical proofs, syllabuses, and threat letters. GENTLE is manually annotated for a variety of popular NLP tasks, including syntactic dependency parsing, entity recognition, coreference resolution, and discourse parsing. We evaluate state-of-the-art NLP systems on GENTLE and find severe degradation for at least some genres in their performance on all tasks, which indicates GENTLE's utility as an evaluation dataset for NLP systems.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes