LG ITJan 28, 2024

An Information-Theoretic Analysis of In-Context Learning

Hong Jun Jeon, Jason D. Lee, Qi Lei, Benjamin Van Roy

arXiv:2401.15530v125.740 citationsh-index: 7ICML

Originality Highly original

AI Analysis

This work provides a foundational theoretical framework for understanding in-context learning in transformers, addressing limitations of prior assumptions.

The paper tackles the theoretical analysis of in-context learning by introducing new information-theoretic tools that decompose error into irreducible, meta-learning, and intra-task components, unifying analyses across meta-learning challenges and applying them to establish new results about error decay with training sequences and sequence lengths in transformers.

Previous theoretical results pertaining to meta-learning on sequences build on contrived assumptions and are somewhat convoluted. We introduce new information-theoretic tools that lead to an elegant and very general decomposition of error into three components: irreducible error, meta-learning error, and intra-task error. These tools unify analyses across many meta-learning challenges. To illustrate, we apply them to establish new results about in-context learning with transformers. Our theoretical results characterizes how error decays in both the number of training sequences and sequence lengths. Our results are very general; for example, they avoid contrived mixing time assumptions made by all prior results that establish decay of error with sequence length.

View on arXiv PDF

Similar