BabyLM's First Constructions: Causal probing provides a signal of learning
This work addresses the relevance of language models to human language learning by showing construction learning with realistic data, though it is incremental as it builds on existing methods.
The study investigated whether language models trained on developmentally plausible data learn constructions, finding that they acquire diverse constructions, including hard cases, and that better construction representation correlates with improved benchmark performance.
Construction grammar posits that language learners acquire constructions (form-meaning pairings) from the statistics of their environment. Recent work supports this hypothesis by showing sensitivity to constructions in pretrained language models (PLMs), including one recent study (Rozner et al., 2025) demonstrating that constructions shape RoBERTa's output distribution. However, models under study have generally been trained on developmentally implausible amounts of data, casting doubt on their relevance to human language learning. Here we use Rozner et al.'s methods to evaluate construction learning in masked language models from the 2024 BabyLM Challenge. Our results show that even when trained on developmentally plausible quantities of data, models learn diverse constructions, even hard cases that are superficially indistinguishable. We further find correlational evidence that constructional performance may be functionally relevant: models that better represent construction perform better on the BabyLM benchmarks.