SDLGASSep 15, 2023

Two-Step Knowledge Distillation for Tiny Speech Enhancement

arXiv:2309.08144v115 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses the need for efficient, tiny models in embedded audio machine learning, offering incremental improvements in speech enhancement under challenging scenarios.

The paper tackles the problem of compressing speech enhancement models for embedded applications by proposing a two-step knowledge distillation approach, which achieves signal to distortion ratio gains of 0.9 dB and 1.1 dB in adverse conditions like low SNR and high compression.

Tiny, causal models are crucial for embedded audio machine learning applications. Model compression can be achieved via distilling knowledge from a large teacher into a smaller student model. In this work, we propose a novel two-step approach for tiny speech enhancement model distillation. In contrast to the standard approach of a weighted mixture of distillation and supervised losses, we firstly pre-train the student using only the knowledge distillation (KD) objective, after which we switch to a fully supervised training regime. We also propose a novel fine-grained similarity-preserving KD loss, which aims to match the student's intra-activation Gram matrices to that of the teacher. Our method demonstrates broad improvements, but particularly shines in adverse conditions including high compression and low signal to noise ratios (SNR), yielding signal to distortion ratio gains of 0.9 dB and 1.1 dB, respectively, at -5 dB input SNR and 63x compression compared to baseline.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes