SD AI ASMay 28, 2025

Effective and Efficient One-pass Compression of Speech Foundation Models Using Sparsity-aware Self-pinching Gates

Haoning Xu, Zhaoqing Li, Youjun Chen, Huimeng Wang, Guinan Li, Mengzhe Geng, Chengxi Deng, Xunying Liu

arXiv:2505.22608v14.0h-index: 20INTERSPEECH

Originality Incremental advance

AI Analysis

This work addresses the need for efficient compression of large speech models for deployment, offering an incremental improvement over existing methods.

This paper tackles the problem of compressing speech foundation models by integrating pruning and parameter updates into a single stage, achieving 65% and 60% parameter reductions for wav2vec2.0-base and HuBERT-large models with no significant word error rate increase and a 25% reduction in compression time.

This paper presents a novel approach for speech foundation models compression that tightly integrates model pruning and parameter update into a single stage. Highly compact layer-level tied self-pinching gates each containing only a single learnable threshold are jointly trained with uncompressed models and used in fine-grained neuron level pruning. Experiments conducted on the LibriSpeech-100hr corpus suggest that our approach reduces the number of parameters of wav2vec2.0-base and HuBERT-large models by 65% and 60% respectively, while incurring no statistically significant word error rate (WER) increase on the test-clean dataset. Compared to previously published methods on the same task, our approach not only achieves the lowest WER of 7.05% on the test-clean dataset under a comparable model compression ratio of 4.26x, but also operates with at least 25% less model compression time.

View on arXiv PDF

Similar