SD ASFeb 3, 2021

Impact of Sound Duration and Inactive Frames on Sound Event Detection Performance

Keisuke Imoto, Sakiko Mishima, Yumi Arai, Reishi Kondo

arXiv:2102.01927v115.524 citations

Originality Incremental advance

AI Analysis

This research addresses the data imbalance problem in Sound Event Detection, which is crucial for improving the accuracy of detecting sound events for practitioners working with real-world audio data.

This paper investigates the impact of varying sound event durations and the prevalence of inactive frames on Sound Event Detection (SED) performance. It explores four different loss functions—simple reweighting, inverse frequency, asymmetric focal, and focal batch Tversky loss—to address the data imbalance caused by these factors.

In many methods of sound event detection (SED), a segmented time frame is regarded as one data sample to model training. The durations of sound events greatly depend on the sound event class, e.g., the sound event "fan" has a long duration, whereas the sound event "mouse clicking" is instantaneous. Thus, the difference in the duration between sound event classes results in a serious data imbalance in SED. Moreover, most sound events tend to occur occasionally; therefore, there are many more inactive time frames of sound events than active frames. This also causes a severe data imbalance between active and inactive frames. In this paper, we investigate the impact of sound duration and inactive frames on SED performance by introducing four loss functions, such as simple reweighting loss, inverse frequency loss, asymmetric focal loss, and focal batch Tversky loss. Then, we provide insights into how we tackle this imbalance problem.

View on arXiv PDF

Similar