SDASOct 25, 2020

An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection

arXiv:2010.13092v4108 citations
Originality Incremental advance
AI Analysis

This is an incremental improvement for audio processing systems, enhancing accuracy in detecting and localizing multiple simultaneous sound events.

The paper tackles polyphonic sound event localization and detection by addressing overlapping events of the same type with different directions and performance loss from hard parameter-sharing, proposing an improved network that outperforms previous methods and matches state-of-the-art ensemble models.

Polyphonic sound event localization and detection (SELD), which jointly performs sound event detection (SED) and direction-of-arrival (DoA) estimation, detects the type and occurrence time of sound events as well as their corresponding DoA angles simultaneously. We study the SELD task from a multi-task learning perspective. Two open problems are addressed in this paper. Firstly, to detect overlapping sound events of the same type but with different DoAs, we propose to use a trackwise output format and solve the accompanying track permutation problem with permutation-invariant training. Multi-head self-attention is further used to separate tracks. Secondly, a previous finding is that, by using hard parameter-sharing, SELD suffers from a performance loss compared with learning the subtasks separately. This is solved by a soft parameter-sharing scheme. We term the proposed method as Event Independent Network V2 (EINV2), which is an improved version of our previously-proposed method and an end-to-end network for SELD. We show that our proposed EINV2 for joint SED and DoA estimation outperforms previous methods by a large margin, and has comparable performance to state-of-the-art ensemble models.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes