Algorithmic Capture, Computational Complexity, and Inductive Bias of Infinite Transformers

arXiv:2603.11161v16.9h-index: 15

Predicted impact top 60% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This addresses the fundamental issue of understanding computational limits and inductive biases in transformers for researchers in machine learning theory, but it is incremental as it builds on existing work on expressivity and complexity.

The paper tackles the problem of distinguishing true algorithmic learning from statistical interpolation in neural networks by defining Algorithmic Capture and analyzing infinite-width transformers, showing they have an inductive bias towards low-complexity algorithms in the EPTHS class, preventing capture of higher-complexity ones while succeeding on simpler tasks like search, copy, and sort.

We formally define Algorithmic Capture (i.e., ``grokking'' an algorithm) as the ability of a neural network to generalize to arbitrary problem sizes ($T$) with controllable error and minimal sample adaptation, distinguishing true algorithmic learning from statistical interpolation. By analyzing infinite-width transformers in both the lazy and rich regimes, we derive upper bounds on the inference-time computational complexity of the functions these networks can learn. We show that despite their universal expressivity, transformers possess an inductive bias towards low-complexity algorithms within the Efficient Polynomial Time Heuristic Scheme (EPTHS) class. This bias effectively prevents them from capturing higher-complexity algorithms, while allowing success on simpler tasks like search, copy, and sort.

View on arXiv PDF

Similar