CL AIOct 14, 2025

CurLL: A Developmental Framework to Evaluate Continual Learning in Language Models

Pavan Kalyan, Shubhra Mishra, Satya Lokam, Navin Goyal

arXiv:2510.13008v14.91 citationsh-index: 4Proceedings of the First BabyLM Workshop

Originality Incremental advance

AI Analysis

This work provides a systematic evaluation framework for continual learning in language models, addressing a domain-specific need for better benchmarks in AI education and development.

The authors tackled the problem of evaluating continual learning in language models by introducing CurLL, a dataset and benchmark based on human developmental stages from ages 5-10, which showed trade-offs in skill retention and transfer efficiency when training a 135M-parameter transformer under different setups.

We introduce a comprehensive continual learning dataset and benchmark (CurlL) grounded in human developmental trajectories from ages 5-10, enabling systematic and fine-grained assessment of models' ability to progressively acquire new skills. CurlL spans five developmental stages (0-4) covering ages 5-10, supported by a skill graph that breaks down broad skills into smaller abilities, concrete goals, and measurable indicators, while also capturing which abilities build on others. We generate a 23.4B-token synthetic dataset with controlled skill progression, vocabulary complexity, and format diversity, comprising paragraphs, comprehension-based QA (CQA), skill-testing QA (CSQA), and instruction-response (IR) pairs. Stage-wise token counts range from 2.12B to 6.78B tokens, supporting precise analysis of forgetting, forward transfer, and backward transfer. Using a 135M-parameter transformer trained under independent, joint, and sequential (continual) setups, we show trade-offs in skill retention and transfer efficiency. By mirroring human learning patterns and providing fine-grained control over skill dependencies, this work advances continual learning evaluations for language models.

View on arXiv PDF

Similar