ASLGSDMay 13, 2020

Memory Controlled Sequential Self Attention for Sound Recognition

arXiv:2005.06650v43 citations
AI Analysis

This work addresses sound recognition for applications like audio analysis, but it is incremental as it builds on existing self-attention and CRNN methods with a focus on memory control.

The paper tackles the problem of polyphonic sound event detection by investigating the role of memory in sequential self-attention, proposing a memory-controlled mechanism integrated with a CRNN model. The result is an event-based F-score of 33.92% on the URBAN-SED dataset, outperforming a baseline without self-attention that scored 20.10%.

In this paper we investigate the importance of the extent of memory in sequential self attention for sound recognition. We propose to use a memory controlled sequential self attention mechanism on top of a convolutional recurrent neural network (CRNN) model for polyphonic sound event detection (SED). Experiments on the URBAN-SED dataset demonstrate the impact of the extent of memory on sound recognition performance with the self attention induced SED model. We extend the proposed idea with a multi-head self attention mechanism where each attention head processes the audio embedding with explicit attention width values. The proposed use of memory controlled sequential self attention offers a way to induce relations among frames of sound event tokens. We show that our memory controlled self attention model achieves an event based F -score of 33.92% on the URBAN-SED dataset, outperforming the F -score of 20.10% reported by the model without self attention.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes