ASSDApr 23, 2019

Acoustic scene classification using teacher-student learning with soft-labels

arXiv:1904.10135v230 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of improving classification accuracy for acoustic scene analysis, particularly in noisy or overlapping environments, but it is incremental as it builds on existing teacher-student learning methods.

The paper tackled the problem of acoustic scene classification by addressing the issue of non-mutually exclusive spectral information across classes, using teacher-student learning with soft-labels to account for similarities between scenes, resulting in a classification accuracy of 77.36% on the DCASE 2018 task 1 validation set.

Acoustic scene classification identifies an input segment into one of the pre-defined classes using spectral information. The spectral information of acoustic scenes may not be mutually exclusive due to common acoustic properties across different classes, such as babble noises included in both airports and shopping malls. However, conventional training procedure based on one-hot labels does not consider the similarities between different acoustic scenes. We exploit teacher-student learning with the purpose to derive soft-labels that consider common acoustic properties among different acoustic scenes. In teacher-student learning, the teacher network produces soft-labels, based on which the student network is trained. We investigate various methods to extract soft-labels that better represent similarities across different scenes. Such attempts include extracting soft-labels from multiple audio segments that are defined as an identical acoustic scene. Experimental results demonstrate the potential of our approach, showing a classification accuracy of 77.36 % on the DCASE 2018 task 1 validation set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes