LGJun 27, 2024

A Teacher Is Worth A Million Instructions

arXiv:2406.19112v1
Originality Incremental advance
AI Analysis

This addresses the problem of efficiently training smaller LLMs for researchers and practitioners, offering an incremental improvement over existing methods.

The paper tackles the challenge of training smaller LLMs (7B and 13B parameters) by using knowledge from larger models as teachers and a post-training domain alignment phase, resulting in Mistral 7B and 2x7B models achieving up to 7.9 on MT-Bench and 93.04% on AlpacaEval.

Large Language Models(LLMs) have shown exceptional abilities, yet training these models can be quite challenging. There is a strong dependence on the quality of data and finding the best instruction tuning set. Further, the inherent limitations in training methods create substantial difficulties to train relatively smaller models with 7B and 13B parameters. In our research, we suggest an improved training method for these models by utilising knowledge from larger models, such as a mixture of experts (8x7B) architectures. The scale of these larger models allows them to capture a wide range of variations from data alone, making them effective teachers for smaller models. Moreover, we implement a novel post-training domain alignment phase that employs domain-specific expert models to boost domain-specific knowledge during training while preserving the model's ability to generalise. Fine-tuning Mistral 7B and 2x7B with our method surpasses the performance of state-of-the-art language models with more than 7B and 13B parameters: achieving up to $7.9$ in MT-Bench and $93.04\%$ on AlpacaEval.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes