Revisiting Batch Normalization for Training Low-latency Deep Spiking Neural Networks from Scratch
This work addresses the problem of inefficient training for SNNs, which are crucial for energy-efficient neuromorphic computing, by introducing a novel method that overcomes the non-differentiable nature of spiking neurons, though it is incremental as it builds upon batch normalization concepts.
The authors tackled the challenge of training low-latency deep Spiking Neural Networks (SNNs) from scratch by proposing a temporal Batch Normalization Through Time (BNTT) technique, which enabled training on complex datasets like CIFAR-10 and Tiny-ImageNet with only 25-30 time-steps, achieving state-of-the-art results in terms of latency and energy efficiency.
Spiking Neural Networks (SNNs) have recently emerged as an alternative to deep learning owing to sparse, asynchronous and binary event (or spike) driven processing, that can yield huge energy efficiency benefits on neuromorphic hardware. However, training high-accuracy and low-latency SNNs from scratch suffers from non-differentiable nature of a spiking neuron. To address this training issue in SNNs, we revisit batch normalization and propose a temporal Batch Normalization Through Time (BNTT) technique. Most prior SNN works till now have disregarded batch normalization deeming it ineffective for training temporal SNNs. Different from previous works, our proposed BNTT decouples the parameters in a BNTT layer along the time axis to capture the temporal dynamics of spikes. The temporally evolving learnable parameters in BNTT allow a neuron to control its spike rate through different time-steps, enabling low-latency and low-energy training from scratch. We conduct experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and event-driven DVS-CIFAR10 datasets. BNTT allows us to train deep SNN architectures from scratch, for the first time, on complex datasets with just few 25-30 time-steps. We also propose an early exit algorithm using the distribution of parameters in BNTT to reduce the latency at inference, that further improves the energy-efficiency.