LGNov 19, 2022

Intelligence Processing Units Accelerate Neuromorphic Learning

Pao-Sheng Vincent Sun, Alexander Titterton, Anjlee Gopiani, Tim Santos, Arindam Basu, Wei D. Lu, Jason K. Eshraghian

arXiv:2211.10725v17.88 citationsh-index: 54Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the energy and latency inefficiencies in training SNNs, which is crucial for neuromorphic computing applications, though it appears incremental as it adapts existing methods to a new hardware platform.

The paper tackles the high training cost of spiking neural networks (SNNs) on GPUs by optimizing them for Intelligence Processing Units (IPUs), resulting in accelerated training through fine-grained parallelism and half-precision methods.

Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency when performing inference with deep learning workloads. Error backpropagation is presently regarded as the most effective method for training SNNs, but in a twist of irony, when training on modern graphics processing units (GPUs) this becomes more expensive than non-spiking networks. The emergence of Graphcore's Intelligence Processing Units (IPUs) balances the parallelized nature of deep learning workloads with the sequential, reusable, and sparsified nature of operations prevalent when training SNNs. IPUs adopt multi-instruction multi-data (MIMD) parallelism by running individual processing threads on smaller data blocks, which is a natural fit for the sequential, non-vectorized steps required to solve spiking neuron dynamical state equations. We present an IPU-optimized release of our custom SNN Python package, snnTorch, which exploits fine-grained parallelism by utilizing low-level, pre-compiled custom operations to accelerate irregular and sparse data access patterns that are characteristic of training SNN workloads. We provide a rigorous performance assessment across a suite of commonly used spiking neuron models, and propose methods to further reduce training run-time via half-precision training. By amortizing the cost of sequential processing into vectorizable population codes, we ultimately demonstrate the potential for integrating domain-specific accelerators with the next generation of neural networks.

View on arXiv PDF Code

Similar