CLJul 28, 2022

Efficient Training of Language Models to Fill in the Middle

arXiv:2207.14255v1297 citationsh-index: 45
Originality Incremental advance
AI Analysis

This work addresses the need for efficient and simple training methods to enhance language models' infilling abilities, which is incremental as it builds on existing data augmentation techniques.

The authors demonstrated that autoregressive language models can be trained to fill in missing text spans without harming their original left-to-right generation capabilities, as shown by perplexity and sampling evaluations across various scales. They provided extensive ablations on hyperparameters to establish best practices and released a model and benchmarks for future research.

We show that autoregressive language models can learn to infill text after we apply a straightforward transformation to the dataset, which simply moves a span of text from the middle of a document to its end. While this data augmentation has garnered much interest in recent years, we provide extensive evidence that training models with a large fraction of data transformed in this way does not harm the original left-to-right generative capability, as measured by perplexity and sampling evaluations across a wide range of scales. Given the usefulness, simplicity, and efficiency of training models to fill-in-the-middle (FIM), we suggest that future autoregressive language models be trained with FIM by default. To this end, we run a series of ablations on key hyperparameters, such as the data transformation frequency, the structure of the transformation, and the method of selecting the infill span. We use these ablations to prescribe strong default settings and best practices to train FIM models. We have released our best infilling model trained with best practices in our API, and release our infilling benchmarks to aid future research.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes