DCLGNESDNov 15, 2017

Chipmunk: A Systolically Scalable 0.9 mm${}^2$, 3.08 Gop/s/mW @ 1.2 mW Accelerator for Near-Sensor Recurrent Neural Network Inference

arXiv:1711.05734v248 citations
Originality Incremental advance
AI Analysis

This enables zero-latency voice-based human-machine interfaces on low-power devices, representing an incremental improvement in hardware efficiency for specific RNN applications.

The paper tackles the problem of on-device recurrent neural network inference for low-power mobile and wearable devices by presenting Chipmunk, a small hardware accelerator that achieves a measured peak efficiency of 3.08 Gop/s/mW at 1.24 mW peak power and consumes less than 13 mW average power for real-time phoneme extraction.

Recurrent neural networks (RNNs) are state-of-the-art in voice awareness/understanding and speech recognition. On-device computation of RNNs on low-power mobile and wearable devices would be key to applications such as zero-latency voice-based human-machine interfaces. Here we present Chipmunk, a small (<1 mm${}^2$) hardware accelerator for Long-Short Term Memory RNNs in UMC 65 nm technology capable to operate at a measured peak efficiency up to 3.08 Gop/s/mW at 1.24 mW peak power. To implement big RNN models without incurring in huge memory transfer overhead, multiple Chipmunk engines can cooperate to form a single systolic array. In this way, the Chipmunk architecture in a 75 tiles configuration can achieve real-time phoneme extraction on a demanding RNN topology proposed by Graves et al., consuming less than 13 mW of average power.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes