CCAIDSNESep 5, 2024

On the Complexity of Neural Computation in Superposition

arXiv:2409.15318v315 citationsh-index: 58
Originality Highly original
AI Analysis

It provides foundational theoretical limits on model sparsity and capacity, impacting researchers and practitioners in machine learning by clarifying the trade-offs in neural network efficiency and expressibility.

This paper tackles the problem of understanding the computational complexity of neural networks operating in superposition, establishing that computing m' features requires at least Ω(√(m' log m')) neurons and Ω(m' log m') parameters, with a nearly tight upper bound of O(√(m') log m') neurons for certain operations, revealing an exponential gap between computation and representation.

Superposition, the ability of neural networks to represent more features than neurons, is increasingly seen as key to the efficiency of large models. This paper investigates the theoretical foundations of computing in superposition, establishing complexity bounds for explicit, provably correct algorithms. We present the first lower bounds for a neural network computing in superposition, showing that for a broad class of problems, including permutations and pairwise logical operations, computing $m'$ features in superposition requires at least $Ω(\sqrt{m' \log m'})$ neurons and $Ω(m' \log m')$ parameters. This implies an explicit limit on how much one can sparsify or distill a model while preserving its expressibility, and complements empirical scaling laws by implying the first subexponential bound on capacity: a network with $n$ neurons can compute at most $O(n^2 / \log n)$ features. Conversely, we provide a nearly tight constructive upper bound: logical operations like pairwise AND can be computed using $O(\sqrt{m'} \log m')$ neurons and $O(m' \log^2 m')$ parameters. There is thus an exponential gap between the complexity of computing in superposition (the subject of this work) versus merely representing features, which can require as little as $O(\log m')$ neurons based on the Johnson-Lindenstrauss Lemma. Our work analytically establishes that the number of parameters is a good estimator of the number of features a neural network computes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes