ASSDMar 23, 2021

Joint framework with deep feature distillation and adaptive focal loss for weakly supervised audio tagging and acoustic event detection

arXiv:2103.12388v212 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of limited labeled data in audio analysis for applications like sound recognition and event detection, but it is incremental as it builds on an existing teacher-student framework.

The paper tackles the problem of improving weakly supervised audio tagging and acoustic event detection by proposing a joint framework with deep feature distillation and adaptive focal loss, achieving competitive F1-scores of 81.2% for audio tagging and 49.8% for acoustic event detection on the DCASE 2019 Task 4 dataset.

A good joint training framework is very helpful to improve the performances of weakly supervised audio tagging (AT) and acoustic event detection (AED) simultaneously. In this study, we propose three methods to improve the best teacher-student framework in the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Task 4 for both audio tagging and acoustic events detection tasks. A frame-level target-events based deep feature distillation is first proposed, which aims to leverage the potential of limited strong-labeled data in weakly supervised framework to learn better intermediate feature maps. Then, we propose an adaptive focal loss and two-stage training strategy to enable an effective and more accurate model training, where the contribution of hard and easy acoustic events to the total cost function can be automatically adjusted. Furthermore, an event-specific post processing is designed to improve the prediction of target event time-stamps. Our experiments are performed on the public DCASE 2019 Task 4 dataset, results show that our approach achieves competitive performances in both AT (81.2\% F1-score) and AED (49.8\% F1-score) tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes