SDLGASMar 14, 2025

Creating a Good Teacher for Knowledge Distillation in Acoustic Scene Classification

arXiv:2503.11363v12 citationsh-index: 16
Originality Synthesis-oriented
AI Analysis

This work addresses a gap in optimizing teacher models for low-complexity acoustic scene classification systems, but it is incremental as it builds on established knowledge distillation techniques.

The study investigated how teacher model attributes affect student performance in knowledge distillation for acoustic scene classification, finding that teacher size, device generalization methods, ensembling strategy, and ensemble size are key factors.

Knowledge Distillation (KD) is a widespread technique for compressing the knowledge of large models into more compact and efficient models. KD has proved to be highly effective in building well-performing low-complexity Acoustic Scene Classification (ASC) systems and was used in all the top-ranked submissions to this task of the annual DCASE challenge in the past three years. There is extensive research available on establishing the KD process, designing efficient student models, and forming well-performing teacher ensembles. However, less research has been conducted on investigating which teacher model attributes are beneficial for low-complexity students. In this work, we try to close this gap by studying the effects on the student's performance when using different teacher network architectures, varying the teacher model size, training them with different device generalization methods, and applying different ensembling strategies. The results show that teacher model sizes, device generalization methods, the ensembling strategy and the ensemble size are key factors for a well-performing student network.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes