CLFeb 23

BabyLM Turns 4 and Goes Multilingual: Call for Papers for the 2026 BabyLM Workshop

Leshem Choshen, Ryan Cotterell, Mustafa Omer Gul, Jaap Jumelet, Tal Linzen, Aaron Mueller, Suchir Salhan, Raj Sanjay Shah, Alex Warstadt, Ethan Gotlieb Wilcox

IBM

arXiv:2602.20092v21.61 citationsh-index: 34

Originality Synthesis-oriented

AI Analysis

This workshop addresses researchers in AI and cognitive science by promoting connections between cognitive modeling and efficient language model training, though it is incremental as it builds on previous years' challenges.

The BabyLM Workshop 2026 calls for papers on cognitive modeling and language model pretraining, introducing a new multilingual track for English, Dutch, and Chinese alongside existing small-data challenges.

The goal of the BabyLM is to stimulate new research connections between cognitive modeling and language model pretraining. We invite contributions in this vein to the BabyLM Workshop, which will also include the 4th iteration of the BabyLM Challenge. As in previous years, the challenge features two ``standard'' tracks (Strict and Strict-Small), in which participants must train language models on under 100M or 10M words of data, respectively. This year, we move beyond our previous English-only pretraining datasets with a new Multilingual track, focusing on English, Dutch, and Chinese. For the workshop, we call for papers related to the overall theme of BabyLM, which includes training efficiency, small-scale training datasets, cognitive modeling, model evaluation, and architecture innovation.

View on arXiv PDF

Similar