ReLaX-Net: Reusing Layers for Parameter-Efficient Physical Neural Networks
This work addresses the scalability challenge for PNNs, which are promising for next-generation computing but lag behind digital networks, offering a hardware-friendly solution that could enhance efficiency in domains like image classification and NLP.
The paper tackles the problem of scaling Physical Neural Networks (PNNs) to match digital neural network performance by proposing ReLaX-Net, a parameter-efficient architecture that uses layer-by-layer time-multiplexing to increase effective depth, resulting in improved computational performance and favorable scaling compared to traditional networks with the same parameters.
Physical Neural Networks (PNN) are promising platforms for next-generation computing systems. However, recent advances in digital neural network performance are largely driven by the rapid growth in the number of trainable parameters and, so far, demonstrated PNNs are lagging behind by several orders of magnitude in terms of scale. This mirrors size and performance constraints found in early digital neural networks. In that period, efficient reuse of parameters contributed to the development of parameter-efficient architectures such as convolutional neural networks. In this work, we numerically investigate hardware-friendly weight-tying for PNNs. Crucially, with many PNN systems, there is a time-scale separation between the fast dynamic active elements of the forward pass and the only slowly trainable elements implementing weights and biases. With this in mind,we propose the Reuse of Layers for eXpanding a Neural Network (ReLaX-Net) architecture, which employs a simple layer-by-layer time-multiplexing scheme to increase the effective network depth and efficiently use the number of parameters. We only require the addition of fast switches for existing PNNs. We validate ReLaX-Nets via numerical experiments on image classification and natural language processing tasks. Our results show that ReLaX-Net improves computational performance with only minor modifications to a conventional PNN. We observe a favorable scaling, where ReLaX-Nets exceed the performance of equivalent traditional RNNs or DNNs with the same number of parameters.