LG AI CCOct 12, 2024

Looped ReLU MLPs May Be All You Need as Practical Programmable Computers

Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou

arXiv:2410.09375v221.122 citationsh-index: 21AISTATS

Originality Incremental advance

AI Analysis

This demonstrates that simple neural network modules have stronger expressive power than previously thought, potentially reducing reliance on advanced architectures like Transformers for complex tasks.

The paper tackles the problem of whether a ReLU-MLP can function as a universal programmable computer with a practical number of weights, and shows that a looped 23-layer ReLU-MLP can perform basic operations more efficiently and effectively than a looped Transformer.

Previous work has demonstrated that attention mechanisms are Turing complete. More recently, it has been shown that a looped 9-layer Transformer can function as a universal programmable computer. In contrast, the multi-layer perceptrons with $\mathsf{ReLU}$ activation ($\mathsf{ReLU}$-$\mathsf{MLP}$), one of the most fundamental components of neural networks, is known to be expressive; specifically, a two-layer neural network is a universal approximator given an exponentially large number of hidden neurons. However, it remains unclear whether a $\mathsf{ReLU}$-$\mathsf{MLP}$ can be made into a universal programmable computer using a practical number of weights. In this work, we provide an affirmative answer that a looped 23-layer $\mathsf{ReLU}$-$\mathsf{MLP}$ is capable of performing the basic necessary operations, more efficiently and effectively functioning as a programmable computer than a looped Transformer. This indicates simple modules have stronger expressive power than previously expected and have not been fully explored. Our work provides insights into the mechanisms of neural networks and demonstrates that complex tasks, such as functioning as a programmable computer, do not necessarily require advanced architectures like Transformers.

View on arXiv PDF

Similar