Universality of physical neural networks with multivariate nonlinearity
This work addresses the energy efficiency challenge in AI by providing a theoretical foundation for designing universal physical neural networks, which could enable more efficient hardware for deep learning, though it is incremental in building on prior nonlinear optical methods.
The authors tackled the problem of determining whether physical neural networks can learn arbitrary relationships, a key requirement for deep learning known as universality, by presenting a fundamental theorem that establishes a universality condition and proposing a scalable optical architecture that achieves high accuracy on image classification tasks.
The enormous energy demand of artificial intelligence is driving the development of alternative hardware for deep learning. Physical neural networks try to exploit physical systems to perform machine learning more efficiently. In particular, optical systems can calculate with light using negligible energy. While their computational capabilities were long limited by the linearity of optical materials, nonlinear computations have recently been demonstrated through modified input encoding. Despite this breakthrough, our inability to determine if physical neural networks can learn arbitrary relationships between data -- a key requirement for deep learning known as universality -- hinders further progress. Here we present a fundamental theorem that establishes a universality condition for physical neural networks. It provides a powerful mathematical criterion that imposes device constraints, detailing how inputs should be encoded in the tunable parameters of the physical system. Based on this result, we propose a scalable architecture using free-space optics that is provably universal and achieves high accuracy on image classification tasks. Further, by combining the theorem with temporal multiplexing, we present a route to potentially huge effective system sizes in highly practical but poorly scalable on-chip photonic devices. Our theorem and scaling methods apply beyond optical systems and inform the design of a wide class of universal, energy-efficient physical neural networks, justifying further efforts in their development.