Testing Transformer Learnability on the Arithmetic Sequence of Rooted Trees
This work addresses the problem of understanding if AI models can learn inherent mathematical structures, which is incremental as it builds on prior studies of transformer learnability.
The researchers investigated whether a transformer model could learn the deterministic sequence of trees generated from prime factorization of natural numbers, finding that the model partially captured non-trivial regularities and correlations in the sequence.
We study whether a Large Language Model can learn the deterministic sequence of trees generated by the iterated prime factorization of the natural numbers. Each integer is mapped into a rooted planar tree and the resulting sequence $ \mathbb{N}\mathcal{T}$ defines an arithmetic text with measurable statistical structure. A transformer network (the GPT-2 architecture) is trained from scratch on the first $10^{11}$ elements to subsequently test its predictive ability under next-word and masked-word prediction tasks. Our results show that the model partially learns the internal grammar of $\mathbb{N}\mathcal{T}$, capturing non-trivial regularities and correlations. This suggests that learnability may extend beyond empirical data to the very structure of arithmetic.