SDLGNEFeb 24, 2017

Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging

arXiv:1702.07787v1105 citations
Originality Incremental advance
AI Analysis

This work addresses environmental audio tagging for domestic audio scenes, representing an incremental improvement over existing methods.

The paper tackles audio tagging by proposing a convolutional gated recurrent neural network with spatial features, achieving a reduction in equal error rate from 0.13 to 0.11 on the development set and state-of-the-art performance of 0.12 EER on the evaluation set.

Environmental audio tagging is a newly proposed task to predict the presence or absence of a specific audio event in a chunk. Deep neural network (DNN) based methods have been successfully adopted for predicting the audio tags in the domestic audio scene. In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging. Gated recurrent unit (GRU) based recurrent neural networks (RNNs) are then cascaded to model the long-term temporal structure of the audio signal. To complement the input information, an auxiliary CNN is designed to learn on the spatial features of stereo recordings. We evaluate our proposed methods on Task 4 (audio tagging) of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. Compared with our recent DNN-based method, the proposed structure can reduce the equal error rate (EER) from 0.13 to 0.11 on the development set. The spatial features can further reduce the EER to 0.10. The performance of the end-to-end learning on raw waveforms is also comparable. Finally, on the evaluation set, we get the state-of-the-art performance with 0.12 EER while the performance of the best existing system is 0.15 EER.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes