LGAIDec 14, 2025

From Small to Large: Generalization Bounds for Transformers on Variable-Size Inputs

arXiv:2512.12805v21 citations
Originality Incremental advance
AI Analysis

This addresses a theoretical gap in machine learning for researchers, offering insights into size generalization in Transformers, though it is incremental as it builds on existing empirical observations.

The paper tackles the problem of understanding Transformers' ability to generalize from small to large input sizes, providing a theoretical bound on the error between discrete and continuous outputs, with experiments confirming its tightness on graphs and point clouds.

Transformers exhibit a notable property of \emph{size generalization}, demonstrating an ability to extrapolate from smaller token sets to significantly longer ones. This behavior has been documented across diverse applications, including point clouds, graphs, and natural language. Despite its empirical success, this capability still lacks some rigorous theoretical characterizations. In this paper, we develop a theoretical framework to analyze this phenomenon for geometric data, which we represent as discrete samples from a continuous source (e.g., point clouds from manifolds, graphs from graphons). Our core contribution is a bound on the error between the Transformer's output for a discrete sample and its continuous-domain equivalent. We prove that for Transformers with stable positional encodings, this bound is determined by the sampling density and the intrinsic dimensionality of the data manifold. Experiments on graphs and point clouds of various sizes confirm the tightness of our theoretical bound.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes