CLOct 21, 2022

CEFR-Based Sentence Difficulty Annotation and Assessment

arXiv:2210.11766v1300 citationsh-index: 18
Originality Incremental advance
AI Analysis

This addresses a bottleneck in controllable text simplification for language learners and teachers, though it is incremental in method.

The authors tackled the lack of a corpus with sentence difficulty levels for language learning by creating the CEFR-SP corpus with 17k English sentences annotated by professionals, and their assessment model achieved an 84.5% macro-F1 score, outperforming baselines.

Controllable text simplification is a crucial assistive technique for language learning and teaching. One of the primary factors hindering its advancement is the lack of a corpus annotated with sentence difficulty levels based on language ability descriptions. To address this problem, we created the CEFR-based Sentence Profile (CEFR-SP) corpus, containing 17k English sentences annotated with the levels based on the Common European Framework of Reference for Languages assigned by English-education professionals. In addition, we propose a sentence-level assessment model to handle unbalanced level distribution because the most basic and highly proficient sentences are naturally scarce. In the experiments in this study, our method achieved a macro-F1 score of 84.5% in the level assessment, thus outperforming strong baselines employed in readability assessment.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes