CLLGMay 12

What makes a word hard to learn? Modeling L1 influence on English vocabulary difficulty

arXiv:2605.1228184.5
Predicted impact top 54% in CL · last 90 daysOriginality Incremental advance
AI Analysis

It provides interpretable, L1-tailored difficulty estimates for designing vocabulary curricula for English learners.

The study models English vocabulary difficulty for learners with Spanish, German, or Chinese as their first language, finding that word familiarity is the dominant factor across all groups, but orthographic transfer additionally affects Spanish and German learners, while Chinese learners rely solely on familiarity and surface features.

What makes a word difficult to learn, and how does the difficulty depend on the learner's native language? We computationally model vocabulary difficulty for English learners whose first language is Spanish, German, or Chinese with gradient-boosted models trained on features related to a word's familiarity (e.g., frequency), meaning, surface form, and cross-linguistic transfer. Using Shapley values, we determine the importance of each feature group. Word familiarity is the dominant feature group shared by all three languages. However, predictions for Spanish- and German-speaking learners rely additionally on orthographic transfer. This transfer mechanism is unavailable to Chinese learners, whose difficulty is shaped by a combination of familiarity and surface features alone. Our models provide interpretable, L1-tailored difficulty estimates that can be used to design vocabulary curricula.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes