LG MLFeb 12, 2025

In-Context Learning of Linear Dynamical Systems with Transformers: Approximation Bounds and Depth-Separation

Frank Cole, Yuxuan Zhao, Yulong Lu, Tianhao Zhang

arXiv:2502.08136v34.1h-index: 3

Originality Highly original

AI Analysis

This work addresses theoretical understanding of in-context learning for researchers in machine learning, providing insights into transformer architecture design, but it is incremental as it builds on existing approximation theory.

The paper tackles the approximation capabilities of transformers in learning linear dynamical systems in-context, showing that multi-layer transformers with logarithmic depth achieve error bounds comparable to least-squares estimators, while single-layer linear transformers have non-diminishing errors, revealing a depth-separation phenomenon.

This paper investigates approximation-theoretic aspects of the in-context learning capability of the transformers in representing a family of noisy linear dynamical systems. Our first theoretical result establishes an upper bound on the approximation error of multi-layer transformers with respect to an $L^2$-testing loss uniformly defined across tasks. This result demonstrates that transformers with logarithmic depth can achieve error bounds comparable with those of the least-squares estimator. In contrast, our second result establishes a non-diminishing lower bound on the approximation error for a class of single-layer linear transformers, which suggests a depth-separation phenomenon for transformers in the in-context learning of dynamical systems. Moreover, this second result uncovers a critical distinction in the approximation power of single-layer linear transformers when learning from IID versus non-IID data.

View on arXiv PDF

Similar