Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment
This work is a significant step towards building scalable hardware for training extreme-scale neural networks, particularly for researchers and engineers dealing with the communication challenges of large models.
This paper addresses the communication bottleneck in training large neural networks with backpropagation by proposing a photonic co-processor for Direct Feedback Alignment (DFA). The co-processor is designed to compute random projections for trillions of parameters, demonstrating its capability on fully-connected and graph convolutional networks.
The scaling hypothesis motivates the expansion of models past trillions of parameters as a path towards better performance. Recent significant developments, such as GPT-3, have been driven by this conjecture. However, as models scale-up, training them efficiently with backpropagation becomes difficult. Because model, pipeline, and data parallelism distribute parameters and gradients over compute nodes, communication is challenging to orchestrate: this is a bottleneck to further scaling. In this work, we argue that alternative training methods can mitigate these issues, and can inform the design of extreme-scale training hardware. Indeed, using a synaptically asymmetric method with a parallelizable backward pass, such as Direct Feedback Alignement, communication needs are drastically reduced. We present a photonic accelerator for Direct Feedback Alignment, able to compute random projections with trillions of parameters. We demonstrate our system on benchmark tasks, using both fully-connected and graph convolutional networks. Our hardware is the first architecture-agnostic photonic co-processor for training neural networks. This is a significant step towards building scalable hardware, able to go beyond backpropagation, and opening new avenues for deep learning.