SD ASFeb 14, 2020

Sound Event Detection by Multitask Learning of Sound Events and Scenes with Soft Scene Labels

Keisuke Imoto, Noriyuki Tonami, Yuma Koizumi, Masahiro Yasuda, Ryosuke Yamanishi, Yoichi Yamashita

arXiv:2002.05848v116.444 citationsh-index: 39

Originality Incremental advance

AI Analysis

This work addresses sound event detection for environmental sound analysis, presenting an incremental improvement over existing multitask learning approaches.

The paper tackled sound event detection by proposing a multitask learning method that uses soft scene labels to model the relationship between sound events and acoustic scenes, improving SED performance by 3.80% in F-score compared to conventional methods.

Sound event detection (SED) and acoustic scene classification (ASC) are major tasks in environmental sound analysis. Considering that sound events and scenes are closely related to each other, some works have addressed joint analyses of sound events and acoustic scenes based on multitask learning (MTL), in which the knowledge of sound events and scenes can help in estimating them mutually. The conventional MTL-based methods utilize one-hot scene labels to train the relationship between sound events and scenes; thus, the conventional methods cannot model the extent to which sound events and scenes are related. However, in the real environment, common sound events may occur in some acoustic scenes; on the other hand, some sound events occur only in a limited acoustic scene. In this paper, we thus propose a new method for SED based on MTL of SED and ASC using the soft labels of acoustic scenes, which enable us to model the extent to which sound events and scenes are related. Experiments conducted using TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016 datasets show that the proposed method improves the SED performance by 3.80% in F-score compared with conventional MTL-based SED.

View on arXiv PDF

Similar