CLNov 11, 2025

Do Syntactic Categories Help in Developmentally Motivated Curriculum Learning for Language Models?

arXiv:2511.08199v11 citationsh-index: 19Proceedings of the First BabyLM Workshop
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of optimizing curriculum learning for language models by leveraging syntactic data, though it is incremental as it builds on existing developmental and cognitive approaches.

The study investigated whether syntactic categories aid in curriculum learning for language models, finding that using syntactically categorizable data subsets improved performance on reading tasks more than full noisy corpora, with specific gains noted in linguistic task interpretations.

We examine the syntactic properties of BabyLM corpus, and age-groups within CHILDES. While we find that CHILDES does not exhibit strong syntactic differentiation by age, we show that the syntactic knowledge about the training data can be helpful in interpreting model performance on linguistic tasks. For curriculum learning, we explore developmental and several alternative cognitively inspired curriculum approaches. We find that some curricula help with reading tasks, but the main performance improvement come from using the subset of syntactically categorizable data, rather than the full noisy corpus.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes