SDLGAug 10, 2017

DNN and CNN with Weighted and Multi-task Loss Functions for Audio Event Detection

arXiv:1708.03211v239 citations
AI Analysis

This work addresses the problem of detecting rare audio events for audio analysis applications, representing an incremental improvement with strong specific gains.

The authors tackled audio event detection for rare sounds by proposing a system using CNNs and DNNs with weighted and multi-task loss functions, improving F-score from 72.7% to 90.0% and reducing error rate from 0.53 to 0.18 on development data.

This report presents our audio event detection system submitted for Task 2, "Detection of rare sound events", of DCASE 2017 challenge. The proposed system is based on convolutional neural networks (CNNs) and deep neural networks (DNNs) coupled with novel weighted and multi-task loss functions and state-of-the-art phase-aware signal enhancement. The loss functions are tailored for audio event detection in audio streams. The weighted loss is designed to tackle the common issue of imbalanced data in background/foreground classification while the multi-task loss enables the networks to simultaneously model the class distribution and the temporal structures of the target events for recognition. Our proposed systems significantly outperform the challenge baseline, improving F-score from 72.7% to 90.0% and reducing detection error rate from 0.53 to 0.18 on average on the development data. On the evaluation data, our submission obtains an average F1-score of 88.3% and an error rate of 0.22 which are significantly better than those obtained by the DCASE baseline (i.e. an F1-score of 64.1% and an error rate of 0.64).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes