CLAIOct 14, 2025

CurLL: A Developmental Framework to Evaluate Continual Learning in Language Models

arXiv:2510.13008v11 citationsh-index: 4Proceedings of the First BabyLM Workshop
Originality Incremental advance
AI Analysis

This work provides a systematic evaluation framework for continual learning in language models, addressing a domain-specific need for better benchmarks in AI education and development.

The authors tackled the problem of evaluating continual learning in language models by introducing CurLL, a dataset and benchmark based on human developmental stages from ages 5-10, which showed trade-offs in skill retention and transfer efficiency when training a 135M-parameter transformer under different setups.

We introduce a comprehensive continual learning dataset and benchmark (CurlL) grounded in human developmental trajectories from ages 5-10, enabling systematic and fine-grained assessment of models' ability to progressively acquire new skills. CurlL spans five developmental stages (0-4) covering ages 5-10, supported by a skill graph that breaks down broad skills into smaller abilities, concrete goals, and measurable indicators, while also capturing which abilities build on others. We generate a 23.4B-token synthetic dataset with controlled skill progression, vocabulary complexity, and format diversity, comprising paragraphs, comprehension-based QA (CQA), skill-testing QA (CSQA), and instruction-response (IR) pairs. Stage-wise token counts range from 2.12B to 6.78B tokens, supporting precise analysis of forgetting, forward transfer, and backward transfer. Using a 135M-parameter transformer trained under independent, joint, and sequential (continual) setups, we show trade-offs in skill retention and transfer efficiency. By mirroring human learning patterns and providing fine-grained control over skill dependencies, this work advances continual learning evaluations for language models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes