SDASNov 2, 2020

Learning generic feature representation with synthetic data for weakly-supervised sound event detection by inter-frame distance loss

arXiv:2011.00695v14 citations
AI Analysis

This work addresses data scarcity in sound event detection, an incremental improvement for audio processing applications.

The paper tackled the problem of limited labeled data for sound event detection by using synthetic data to improve feature representation, achieving competitive results on DCASE 2018 and 2019 test sets.

Due to the limitation of strong-labeled sound event detection data set, using synthetic data to improve the sound event detection system performance has been a new research focus. In this paper, we try to exploit the usage of synthetic data to improve the feature representation. Based on metric learning, we proposed inter-frame distance loss function for domain adaptation, and prove the effectiveness of it on sound event detection. We also applied multi-task learning with synthetic data. We find the the best performance can be achieved when the two methods being used together. The experiment on DCASE 2018 task 4 test set and DCASE 2019 task 4 synthetic set both show competitive results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes