ASCLLGSDFeb 26, 2024

SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning

arXiv:2402.16830v14 citationsh-index: 312024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)
Originality Incremental advance
AI Analysis

This work addresses efficiency for speech processing applications, representing an incremental improvement over existing knowledge distillation techniques.

The paper tackles the problem of improving efficiency in speech self-supervised learning by introducing SKILL, a method that distills knowledge across groups of layers based on similarity measures, resulting in a distilled model that outperforms prior methods and achieves state-of-the-art results in the 30M parameters class across SUPERB tasks.

Self-supervised learning (SSL) has achieved remarkable success across various speech-processing tasks. To enhance its efficiency, previous works often leverage the use of compression techniques. A notable recent attempt is DPHuBERT, which applies joint knowledge distillation (KD) and structured pruning to learn a significantly smaller SSL model. In this paper, we contribute to this research domain by introducing SKILL, a novel method that conducts distillation across groups of layers instead of distilling individual arbitrarily selected layers within the teacher network. The identification of the layers to distill is achieved through a hierarchical clustering procedure applied to layer similarity measures. Extensive experiments demonstrate that our distilled version of WavLM Base+ not only outperforms DPHuBERT but also achieves state-of-the-art results in the 30M parameters model class across several SUPERB tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes