SDLGASOct 27, 2021

Temporal Knowledge Distillation for On-device Audio Classification

arXiv:2110.14131v232 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of enhancing audio classification for mobile devices, offering a novel distillation approach that is incremental in improving existing methods.

The paper tackles the challenge of improving on-device audio classification models by proposing a new knowledge distillation method that incorporates temporal information from transformer-based models, applicable to various architectures like CNNs and RNNs, and shows improved predictive performance in experiments on audio event detection and noisy keyword spotting datasets.

Improving the performance of on-device audio classification models remains a challenge given the computational limits of the mobile environment. Many studies leverage knowledge distillation to boost predictive performance by transferring the knowledge from large models to on-device models. However, most lack a mechanism to distill the essence of the temporal information, which is crucial to audio classification tasks, or similar architecture is often required. In this paper, we propose a new knowledge distillation method designed to incorporate the temporal knowledge embedded in attention weights of large transformer-based models into on-device models. Our distillation method is applicable to various types of architectures, including the non-attention-based architectures such as CNNs or RNNs, while retaining the original network architecture during inference. Through extensive experiments on both an audio event detection dataset and a noisy keyword spotting dataset, we show that our proposed method improves the predictive performance across diverse on-device architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes