LGAICLMLJul 29, 2023

A Theory for Emergence of Complex Skills in Language Models

MILA
arXiv:2307.15936v2119 citationsh-index: 82
Originality Incremental advance
AI Analysis

This provides a theoretical explanation for emergent abilities in large language models, addressing a key challenge in AI development, though it is incremental as it builds on existing scaling laws.

The paper tackles the problem of understanding how new skills emerge in language models as they scale, proposing a statistical framework that links cross-entropy loss to skill competence and showing that scaling laws enable efficient learning through 'slingshot generalization', where complex tasks involving multiple skills emerge at similar rates as basic skills.

A major driver of AI products today is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up. This phenomenon is poorly understood, and a mechanistic explanation via mathematical analysis of gradient-based training seems difficult. The current paper takes a different approach, analysing emergence using the famous (and empirical) Scaling Laws of LLMs and a simple statistical framework. Contributions include: (a) A statistical framework that relates cross-entropy loss of LLMs to competence on the basic skills that underlie language tasks. (b) Mathematical analysis showing that the Scaling Laws imply a strong form of inductive bias that allows the pre-trained model to learn very efficiently. We informally call this {\em slingshot generalization} since naively viewed it appears to give competence levels at skills that violate usual generalization theory. (c) A key example of slingshot generalization, that competence at executing tasks involving $k$-tuples of skills emerges essentially at the same scaling and same rate as competence on the elementary skills themselves.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes