VOICe: A Sound Event Detection Dataset For Generalizable Domain Adaptation
This work provides a new dataset to help researchers tackle domain shift problems in sound event detection, which is incremental as it builds on existing domain adaptation concepts.
The paper introduces VOICe, a dataset for domain adaptation in sound event detection, addressing performance degradation in unseen conditions like different recording devices or ambient noise, and evaluates a domain adaptation method using adversarial training on this dataset.
The performance of sound event detection methods can significantly degrade when they are used in unseen conditions (e.g. recording devices, ambient noise). Domain adaptation is a promising way to tackle this problem. In this paper, we present VOICe, the first dataset for the development and evaluation of domain adaptation methods for sound event detection. VOICe consists of mixtures with three different sound events ("baby crying", "glass breaking", and "gunshot"), which are over-imposed over three different categories of acoustic scenes: vehicle, outdoors, and indoors. Moreover, the mixtures are also offered without any background noise. VOICe is freely available online (https://doi.org/10.5281/zenodo.3514950). In addition, using an adversarial-based training method, we evaluate the performance of a domain adaptation method on VOICe.