CLAILGJan 27, 2023

Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning

arXiv:2301.11916v4185 citationsh-index: 63
Originality Incremental advance
AI Analysis

This work addresses the problem of optimizing few-shot demonstrations for in-context learning in LLMs, offering a practical solution for users but is incremental as it builds on existing Bayesian and latent variable concepts.

The authors tackled the sensitivity of in-context learning in large language models to demonstration selection by proposing a Bayesian algorithm that views LLMs as latent variable models, resulting in significant improvements over baselines across eight GPT models and eight text classification datasets, with real-world utility shown on GSM8K.

In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning. However, existing literature has highlighted the sensitivity of this capability to the selection of few-shot demonstrations. Current understandings of the underlying mechanisms by which this capability arises from regular language model pretraining objectives remain disconnected from the real-world LLMs. This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models. On this premise, we propose an algorithm to select optimal demonstrations from a set of annotated data with a small LM, and then directly generalize the selected demonstrations to larger LMs. We demonstrate significant improvement over baselines, averaged over eight GPT models on eight real-world text classification datasets. We also demonstrate the real-world usefulness of our algorithm on GSM8K, a math word problem dataset. Our empirical findings support our hypothesis that LLMs implicitly infer a latent variable containing task information.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes