Pruning as Regularization: Sensitivity-Aware One-Shot Pruning in ASR
This work addresses the problem of improving generalization and compression in ASR models by identifying architectural asymmetries for pruning, though it is incremental in applying sensitivity diagnostics to an existing model.
The paper reinterprets neural network pruning as an implicit regularizer for automatic speech recognition, showing that targeted pruning of specific components like decoder self-attention and last encoder layers in Whisper-small reduces word error rates by up to 2.38% absolute on LibriSpeech without fine-tuning.
We challenge the conventional view of neural network pruning as solely a compression technique, demonstrating that one-shot magnitude pruning serves as a powerful implicit regularizer for ASR. Using Whisper-small, we combine gradient- and Fisher-based sensitivity diagnostics with targeted, component-wise pruning. This reveals architectural asymmetries: decoder FFNs are pruning-fragile, whereas decoder self-attention and the last encoder layers contain redundancy that, when removed, improves generalization. Without fine-tuning, pruning 50% of decoder self-attention reduces WER by 2.38% absolute (20.44% relative) on LibriSpeech test-other; pruning the last four encoder layers at 50% instead yields a 1.72% absolute (14.8% relative) improvement. Gains persisted on Common Voice and TED-LIUM datasets. Beyond regularization benefits, our sensitivity-aware approach enables more aggressive one-shot compression. At 40% sparsity, where established global pruning approaches catastrophically fail, our method preserves near-baseline accuracy. This positions pruning as a first-class architectural design tool: knowing where to prune is as important as how much to prune.