The Serial Scaling Hypothesis
This work highlights a critical blind spot in machine learning that could impact model design and hardware development for sequential problems.
The paper identifies that some problems are inherently sequential and cannot be efficiently parallelized, demonstrating that current parallel architectures and diffusion models face fundamental limitations on such tasks.
While machine learning has advanced through massive parallelization, we identify a critical blind spot: some problems are fundamentally sequential. These "inherently serial" problems-from mathematical reasoning to physical simulations to sequential decision-making-require sequentially dependent computational steps that cannot be efficiently parallelized. We formalize this distinction in complexity theory, and demonstrate that current parallel-centric architectures face fundamental limitations on such tasks. Then, we show for first time that diffusion models despite their sequential nature are incapable of solving inherently serial problems. We argue that recognizing the serial nature of computation holds profound implications on machine learning, model design, and hardware development.