CLAIDec 8, 2021

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

arXiv:2112.11446v21615 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of understanding and improving large language model performance for AI researchers and practitioners, but it is incremental as it builds on existing scaling methods.

The paper analyzes Transformer-based language models, including the 280-billion-parameter Gopher, across 152 tasks, achieving state-of-the-art performance in most areas, with scale providing the largest gains in reading comprehension, fact-checking, and toxic language identification.

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gopher. These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model's behaviour, covering the intersection of model scale with bias and toxicity. Finally we discuss the application of language models to AI safety and the mitigation of downstream harms.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes