CLHEP-THHOCOMP-PHJul 11, 2023

Large Language Models

Harvard
arXiv:2307.05782v21143 citationsh-index: 72
Originality Synthesis-oriented
AI Analysis

This is an incremental survey aimed at readers with a background in mathematics or physics, offering a detailed overview without introducing new methods or data.

The paper provides a survey of large language models (LLMs), describing their history, state-of-the-art developments, and underlying transformer architecture, while exploring how these models trained for next-word prediction can perform intelligent tasks.

Artificial intelligence is making spectacular progress, and one of the best examples is the development of large language models (LLMs) such as OpenAI's GPT series. In these lectures, written for readers with a background in mathematics or physics, we give a brief history and survey of the state of the art, and describe the underlying transformer architecture in detail. We then explore some current ideas on how LLMs work and how models trained to predict the next word in a text are able to perform other tasks displaying intelligence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes