LG MLFeb 13, 2025

Task Generalization With AutoRegressive Compositional Structure: Can Learning From $D$ Tasks Generalize to $D^{T}$ Tasks?

Amirhesam Abedsoltan, Huaqing Zhang, Kaiyue Wen, Hongzhou Lin, Jingzhao Zhang, Mikhail Belkin

arXiv:2502.08991v220.511 citationsh-index: 5ICML

Originality Highly original

AI Analysis

This addresses the fundamental question of when learning from limited tasks can scale to large task families, with implications for improving efficiency in AI systems, though it is incremental in extending theoretical insights to practical demonstrations.

The paper tackles the problem of task generalization in large language models by showing that training on a small set of tasks can theoretically generalize to an exponential number of tasks, with empirical validation on sparse parity functions achieving such generalization using Transformers with in-context learning and chain-of-thought reasoning.

Large language models (LLMs) exhibit remarkable task generalization, solving tasks they were never explicitly trained on with only a few demonstrations. This raises a fundamental question: When can learning from a small set of tasks generalize to a large task family? In this paper, we investigate task generalization through the lens of autoregressive compositional structure, where each task is a composition of $T$ operations, and each operation is among a finite family of $D$ subtasks. This yields a total class of size $D^T$. We first show that generalization to all $D^T$ tasks is theoretically achievable by training on only $\widetilde{O}(D)$ tasks. Empirically, we demonstrate that Transformers achieve such exponential task generalization on sparse parity functions via In-context Learning (ICL) and chain-of-thought (CoT) reasoning. We further show generalization in arithmetic and translation, beyond parity functions.

View on arXiv PDF

Similar