LG AI SD ASJun 25, 2025

DiceHuBERT: Distilling HuBERT with a Self-Supervised Learning Objective

Hyung Gun Chi, Zakaria Aldeneh, Tatiana Likhomanenko, Oggi Rudovic, Takuya Higuchi, Li-Wei Chen, Shinji Watanabe, Ahmed Hussen Abdelaziz

Apple

arXiv:2507.02911v19.42 citationsh-index: 73INTERSPEECH

Originality Incremental advance

AI Analysis

This addresses the need for efficient, high-performance speech models for applications like automatic speech recognition, though it is incremental as it builds on existing distillation and HuBERT methods.

The paper tackled the problem of compressing HuBERT, a self-supervised speech model, by introducing DiceHuBERT, a knowledge distillation framework that uses the same SSL objective as HuBERT, eliminating the need for additional modules. The result was improved phoneme recognition by over 21% and ASR performance by more than 14% on SUPERB benchmarks.

We introduce DiceHuBERT, a knowledge distillation framework for compressing HuBERT, a widely used self-supervised learning (SSL)-based speech foundation model. Unlike existing distillation methods that rely on layer-wise and feature-wise mapping between teacher and student models, DiceHuBERT leverages HuBERT's iterative self-distillation mechanism by directly replacing the original model with a student model. This replacement allows the student to be trained using the same SSL objective used when pre-training HuBERT, eliminating the need for additional modules or architectural constraints. Experimental results on SUPERB show that DiceHuBERT consistently outperforms existing distillation methods, improving phoneme recognition performance by over 21% and ASR performance by more than 14%. Furthermore, DiceHuBERT demonstrates competitive performance across multiple tasks, highlighting its clear advantage.

View on arXiv PDF

Similar