SDCVLGASSep 13, 2022

Binaural Signal Representations for Joint Sound Event Detection and Acoustic Scene Classification

arXiv:2209.05900v17 citationsh-index: 33
Originality Incremental advance
AI Analysis

This work addresses the problem of acoustic scene analysis for machine listening systems, but it is incremental as it builds on existing joint task approaches with new features.

The paper tackled the joint tasks of sound event detection and acoustic scene classification by investigating spatial audio features in a deep neural network model, finding that specific binaural features like GCC-phat and phase differences improved performance over baseline methods using logmel energies alone.

Sound event detection (SED) and Acoustic scene classification (ASC) are two widely researched audio tasks that constitute an important part of research on acoustic scene analysis. Considering shared information between sound events and acoustic scenes, performing both tasks jointly is a natural part of a complex machine listening system. In this paper, we investigate the usefulness of several spatial audio features in training a joint deep neural network (DNN) model performing SED and ASC. Experiments are performed for two different datasets containing binaural recordings and synchronous sound event and acoustic scene labels to analyse the differences between performing SED and ASC separately or jointly. The presented results show that the use of specific binaural features, mainly the Generalized Cross Correlation with Phase Transform (GCC-phat) and sines and cosines of phase differences, result in a better performing model in both separate and joint tasks as compared with baseline methods based on logmel energies only.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes