ASLGSDMLAug 2, 2019

Sound source detection, localization and classification using consecutive ensemble of CRNN models

arXiv:1908.00766v20.0073 citations
AI Analysis60

This work addresses sound source detection and localization for audio processing applications, presenting an incremental improvement over existing methods.

The paper tackles sound event localization and detection by decomposing the task into four consecutive CRNN models to estimate active sources, directions, and classifications, achieving results evaluated on the TAU Spatial Sound Events 2019 dataset.

In this paper, we describe our method for DCASE2019 task3: Sound Event Localization and Detection (SELD). We use four CRNN SELDnet-like single output models which run in a consecutive manner to recover all possible information of occurring events. We decompose the SELD task into estimating number of active sources, estimating direction of arrival of a single source, estimating direction of arrival of the second source where the direction of the first one is known and a multi-label classification task. We use custom consecutive ensemble to predict events' onset, offset, direction of arrival and class. The proposed approach is evaluated on the TAU Spatial Sound Events 2019 - Ambisonic and it is compared with other participants' submissions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes