CLOct 23, 2025

Dialogue Is Not Enough to Make a Communicative BabyLM (But Neither Is Developmentally Inspired Reinforcement Learning)

arXiv:2510.20358v12 citationsh-index: 18Proceedings of the First BabyLM Workshop
Originality Incremental advance
AI Analysis

This work addresses the challenge of creating communicative small language models for dialogue applications, but it is incremental as it builds on existing fine-tuning methods.

The study examined if small language models pre-trained only on dialogue data can be effective, finding that while they underperformed on standard benchmarks, they excelled at dialogue continuation prediction, with DPO fine-tuning further improving performance on a custom dialogue benchmark.

We investigate whether pre-training exclusively on dialogue data results in formally and functionally apt small language models. Based on this pre-trained llamalogue model, we employ a variety of fine-tuning strategies to enforce "more communicative" text generations by our models. Although our models underperform on most standard BabyLM benchmarks, they excel at dialogue continuation prediction in a minimal pair setting. While PPO fine-tuning has mixed to adversarial effects on our models, DPO fine-tuning further improves their performance on our custom dialogue benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes