EvoLM: In Search of Lost Language Model Training Dynamics
This work addresses the challenge for downstream developers in evaluating design choices across LM training stages, though it is incremental in providing a systematic analysis tool.
The authors tackled the problem of opaque training dynamics in modern language models by introducing EvoLM, a suite of over 100 models with 1B and 4B parameters, which revealed insights such as diminishing returns from excessive training and the importance of mitigating forgetting during domain-specific continued pre-training.
Modern language model (LM) training has been divided into multiple stages, making it difficult for downstream developers to evaluate the impact of design choices made at each stage. We present EvoLM, a model suite that enables systematic and transparent analysis of LMs' training dynamics across pre-training, continued pre-training, supervised fine-tuning, and reinforcement learning. We train over 100 LMs with 1B and 4B parameters from scratch, and evaluate both upstream (language modeling) and downstream (problem-solving) capabilities, including considerations of both in-domain and out-of-domain generalization. Key insights highlight the diminishing returns from excessive pre-training and post-training, the importance and practices of mitigating forgetting during domain-specific continued pre-training, the crucial role of continued pre-training in bridging pre-training and post-training phases, and various intricate trade-offs when configuring supervised fine-tuning and reinforcement learning. To facilitate open research and reproducibility, we release all pre-trained and post-trained models, training datasets for all stages, and our entire training and evaluation pipeline.