SDLGASMay 25, 2021

Spectrum Correction: Acoustic Scene Classification with Mismatched Recording Devices

arXiv:2105.11856v1
Originality Incremental advance
AI Analysis

This addresses a practical issue for audio classification systems in real-world scenarios where device variability can degrade performance, though it is an incremental improvement over existing techniques.

The paper tackles the problem of acoustic scene classification failing to generalize across mismatched recording devices by introducing a straightforward method that corrects frequency responses, achieving first place in the DCASE 2019 challenge with 75% accuracy.

Machine learning algorithms, when trained on audio recordings from a limited set of devices, may not generalize well to samples recorded using other devices with different frequency responses. In this work, a relatively straightforward method is introduced to address this problem. Two variants of the approach are presented. First requires aligned examples from multiple devices, the second approach alleviates this requirement. This method works for both time and frequency domain representations of audio recordings. Further, a relation to standardization and Cepstral Mean Subtraction is analysed. The proposed approach becomes effective even when very few examples are provided. This method was developed during the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge and won the 1st place in the scenario with mis-matched recording devices with the accuracy of 75%. Source code for the experiments can be found online.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes