A Scalable Hybrid Training Approach for Recurrent Spiking Neural Networks
This work addresses the challenge of scalable and efficient training for RSNNs, particularly for neuromorphic systems, though it appears incremental as it builds on existing forward gradient methods.
The authors tackled the problem of training recurrent spiking neural networks (RSNNs) efficiently by introducing HYPR, a hybrid approach that combines parallelization with approximate online forward learning, resulting in high-throughput online learning with constant memory demands and an unprecedentedly low performance gap compared to Backpropagation through time (BPTT).
Recurrent spiking neural networks (RSNNs) can be implemented very efficiently in neuromorphic systems. Nevertheless, training of these models with powerful gradient-based learning algorithms is mostly performed on standard digital hardware using Backpropagation through time (BPTT). However, BPTT has substantial limitations. It does not permit online training and its memory consumption scales linearly with the number of computation steps. In contrast, learning methods using forward propagation of gradients operate in an online manner with a memory consumption independent of the number of time steps. These methods enable SNNs to learn from continuous, infinite-length input sequences. Yet, slow execution speed on conventional hardware as well as inferior performance has hindered their widespread application. In this work, we introduce HYbrid PRopagation (HYPR) that combines the efficiency of parallelization with approximate online forward learning. Our algorithm yields high-throughput online learning through parallelization, paired with constant, i.e., sequence length independent, memory demands. HYPR enables parallelization of parameter update computation over the sub sequences for RSNNs consisting of almost arbitrary non-linear spiking neuron models. We apply HYPR to networks of spiking neurons with oscillatory subthreshold dynamics. We find that this type of neuron model is particularly well trainable by HYPR, resulting in an unprecedentedly low task performance gap between approximate forward gradient learning and BPTT.