SD ASNov 2, 2020

Learning generic feature representation with synthetic data for weakly-supervised sound event detection by inter-frame distance loss

Yuxin Huang, Liwei Lin, Xiangdong Wang, Hong Liu, Yueliang Qian, Min Liu, Kazushige Ouchi

arXiv:2011.00695v16.24 citations

Originality Incremental advance

AI Analysis

This work addresses data scarcity in sound event detection, an incremental improvement for audio processing applications.

The paper tackled the problem of limited labeled data for sound event detection by using synthetic data to improve feature representation, achieving competitive results on DCASE 2018 and 2019 test sets.

Due to the limitation of strong-labeled sound event detection data set, using synthetic data to improve the sound event detection system performance has been a new research focus. In this paper, we try to exploit the usage of synthetic data to improve the feature representation. Based on metric learning, we proposed inter-frame distance loss function for domain adaptation, and prove the effectiveness of it on sound event detection. We also applied multi-task learning with synthetic data. We find the the best performance can be achieved when the two methods being used together. The experiment on DCASE 2018 task 4 test set and DCASE 2019 task 4 synthetic set both show competitive results.

View on arXiv PDF

Similar