CLOct 23, 2025

Mask and You Shall Receive: Optimizing Masked Language Modeling For Pretraining BabyLMs

arXiv:2510.20475v11 citationsh-index: 2Proceedings of the First BabyLM Workshop
Originality Incremental advance
AI Analysis

This work addresses the challenge of developing more effective pretraining methods for resource-constrained language models, though it is incremental as it builds on existing MLM techniques.

The authors tackled the problem of improving pretraining efficiency for small language models by optimizing masked language modeling with adaptive token masking and sub-token embeddings, resulting in a substantial performance increase on (Super)GLUE tasks and beating the baseline in the BabyLM Challenge strict-small track.

We describe our strategy for the 2025 edition of the BabyLM Challenge. Our main contribution is that of an improved form of Masked Language Modeling (MLM), which adapts the probabilities of the tokens masked according to the model's ability to predict them. The results show a substantial increase in performance on (Super)GLUE tasks over the standard MLM. We also incorporate sub-token embeddings, finding that this increases the model's morphological generalization capabilities. Our submission beats the baseline in the strict-small track.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes