Dialogue Is Not Enough to Make a Communicative BabyLM (But Neither Is Developmentally Inspired Reinforcement Learning)
This work addresses the challenge of creating communicative small language models for dialogue applications, but it is incremental as it builds on existing fine-tuning methods.
The study examined if small language models pre-trained only on dialogue data can be effective, finding that while they underperformed on standard benchmarks, they excelled at dialogue continuation prediction, with DPO fine-tuning further improving performance on a custom dialogue benchmark.
We investigate whether pre-training exclusively on dialogue data results in formally and functionally apt small language models. Based on this pre-trained llamalogue model, we employ a variety of fine-tuning strategies to enforce "more communicative" text generations by our models. Although our models underperform on most standard BabyLM benchmarks, they excel at dialogue continuation prediction in a minimal pair setting. While PPO fine-tuning has mixed to adversarial effects on our models, DPO fine-tuning further improves their performance on our custom dialogue benchmark.