CLOct 24, 2024

Difficult for Whom? A Study of Japanese Lexical Complexity

Adam Nohejl, Akio Hayakawa, Yusuke Ide, Taro Watanabe

arXiv:2410.18567v113.823 citationsh-index: 6Has CodeProceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024)

Originality Incremental advance

AI Analysis

This work addresses lexical complexity personalization for language learners, but it is incremental as it builds on existing datasets and methods with limited improvements.

The study tackled the assumption that lexical complexity is uniform across populations by analyzing a Japanese dataset, finding that native Chinese speakers perceive complexity differently due to Sino-Japanese vocabulary, and showed that group-mean models perform similarly to individual models in complex word identification but poorly in lexical complexity prediction for individuals.

The tasks of lexical complexity prediction (LCP) and complex word identification (CWI) commonly presuppose that difficult to understand words are shared by the target population. Meanwhile, personalization methods have also been proposed to adapt models to individual needs. We verify that a recent Japanese LCP dataset is representative of its target population by partially replicating the annotation. By another reannotation we show that native Chinese speakers perceive the complexity differently due to Sino-Japanese vocabulary. To explore the possibilities of personalization, we compare competitive baselines trained on the group mean ratings and individual ratings in terms of performance for an individual. We show that the model trained on a group mean performs similarly to an individual model in the CWI task, while achieving good LCP performance for an individual is difficult. We also experiment with adapting a finetuned BERT model, which results only in marginal improvements across all settings.

View on arXiv PDF Code

Similar