SDLGNEApr 4, 2016

Recurrent Neural Networks for Polyphonic Sound Event Detection in Real Life Recordings

arXiv:1604.00861v1337 citations
Originality Highly original
AI Analysis

This work addresses the problem of detecting multiple overlapping sounds in everyday environments for applications like audio analysis and monitoring, representing a strong incremental advance.

The paper tackles polyphonic sound event detection in real-life recordings using a bidirectional LSTM RNN, achieving an average F1-score of 65.5% on 1-second blocks and 64.7% on single frames, with relative improvements of 6.8% and 15.1% over previous state-of-the-art.

In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs). A single multilabel BLSTM RNN is trained to map acoustic features of a mixture signal consisting of sounds from multiple classes, to binary activity indicators of each event class. Our method is tested on a large database of real-life recordings, with 61 classes (e.g. music, car, speech) from 10 different everyday contexts. The proposed method outperforms previous approaches by a large margin, and the results are further improved using data augmentation techniques. Overall, our system reports an average F1-score of 65.5% on 1 second blocks and 64.7% on single frames, a relative improvement over previous state-of-the-art approach of 6.8% and 15.1% respectively.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes