ASSDOct 28, 2019

A Bin Encoding Training of a Spiking Neural Network-based Voice Activity Detection

arXiv:1910.12459v1
Originality Incremental advance
AI Analysis

This work addresses the need for ultra-low-power audio processing systems, particularly for applications like neuromorphic chips, though it is incremental as it builds on existing SNN architectures for a specific domain.

The paper tackles the problem of high power consumption in voice activity detection systems by developing a spiking neural network-based classifier that uses a novel bin encoding method to convert audio features into spike patterns, achieving state-of-the-art performance with only 3.8 μW power consumption.

Advances of deep learning for Artificial Neural Networks(ANNs) have led to significant improvements in the performance of digital signal processing systems implemented on digital chips. Although recent progress in low-power chips is remarkable, neuromorphic chips that run Spiking Neural Networks (SNNs) based applications offer an even lower power consumption, as a consequence of the ensuing sparse spike-based coding scheme. In this work, we develop a SNN-based Voice Activity Detection (VAD) system that belongs to the building blocks of any audio and speech processing system. We propose to use the bin encoding, a novel method to convert log mel filterbank bins of single-time frames into spike patterns. We integrate the proposed scheme in a bilayer spiking architecture which was evaluated on the QUT-NOISE-TIMIT corpus. Our approach shows that SNNs enable an ultra low-power implementation of a VAD classifier that consumes only 3.8$μ$W, while achieving state-of-the-art performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes