CLOct 21, 2022

CEFR-Based Sentence Difficulty Annotation and Assessment

Yuki Arase, Satoru Uchida, Tomoyuki Kajiwara

arXiv:2210.11766v124.6300 citationsh-index: 18Has Code

Originality Incremental advance

AI Analysis

This addresses a bottleneck in controllable text simplification for language learners and teachers, though it is incremental in method.

The authors tackled the lack of a corpus with sentence difficulty levels for language learning by creating the CEFR-SP corpus with 17k English sentences annotated by professionals, and their assessment model achieved an 84.5% macro-F1 score, outperforming baselines.

Controllable text simplification is a crucial assistive technique for language learning and teaching. One of the primary factors hindering its advancement is the lack of a corpus annotated with sentence difficulty levels based on language ability descriptions. To address this problem, we created the CEFR-based Sentence Profile (CEFR-SP) corpus, containing 17k English sentences annotated with the levels based on the Common European Framework of Reference for Languages assigned by English-education professionals. In addition, we propose a sentence-level assessment model to handle unbalanced level distribution because the most basic and highly proficient sentences are naturally scarce. In the experiments in this study, our method achieved a macro-F1 score of 84.5% in the level assessment, thus outperforming strong baselines employed in readability assessment.

View on arXiv PDF Code

Similar