LGMLJul 28, 2021

Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers

arXiv:2107.13163v3117 citations
Originality Incremental advance
AI Analysis

This work addresses theoretical limitations in neural network approximation for researchers, providing more realistic bounds and tools for generalization analysis, though it is incremental in refining existing approximation frameworks.

The paper tackles the problem of unrealistic approximations in neural network theory by introducing a formal definition of statistically meaningful (SM) approximation, which requires good statistical learnability, and shows that overparameterized feedforward nets can SM approximate boolean circuits with polynomial sample complexity in circuit size, and transformers can SM approximate Turing machines with polynomial sample complexity in parameters like alphabet size and log of computation time.

A common lens to theoretically study neural net architectures is to analyze the functions they can approximate. However, constructions from approximation theory may be unrealistic and therefore less meaningful. For example, a common unrealistic trick is to encode target function values using infinite precision. To address these issues, this work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability. We study SM approximation for two function classes: boolean circuits and Turing machines. We show that overparameterized feedforward neural nets can SM approximate boolean circuits with sample complexity depending only polynomially on the circuit size, not the size of the network. In addition, we show that transformers can SM approximate Turing machines with computation time bounded by $T$ with sample complexity polynomial in the alphabet size, state space size, and $\log (T)$. We also introduce new tools for analyzing generalization which provide much tighter sample complexities than the typical VC-dimension or norm-based bounds, which may be of independent interest.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes