SD LG ASOct 1, 2022

Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition

Jash Rathod, Nauman Dawalatabad, Shatrughan Singh, Dhananjaya Gowda

arXiv:2210.00169v19.415 citationsh-index: 13

Originality Incremental advance

AI Analysis

This work addresses the need for smaller ASR models for smart devices, representing an incremental improvement in compression techniques.

The authors tackled the problem of reducing model size for on-device speech recognition by proposing a multi-stage progressive compression approach using knowledge distillation on a conformer transducer model, achieving compression rates over 60% on the LibriSpeech dataset with minimal performance degradation.

The smaller memory bandwidth in smart devices prompts development of smaller Automatic Speech Recognition (ASR) models. To obtain a smaller model, one can employ the model compression techniques. Knowledge distillation (KD) is a popular model compression approach that has shown to achieve smaller model size with relatively lesser degradation in the model performance. In this approach, knowledge is distilled from a trained large size teacher model to a smaller size student model. Also, the transducer based models have recently shown to perform well for on-device streaming ASR task, while the conformer models are efficient in handling long term dependencies. Hence in this work we employ a streaming transducer architecture with conformer as the encoder. We propose a multi-stage progressive approach to compress the conformer transducer model using KD. We progressively update our teacher model with the distilled student model in a multi-stage setup. On standard LibriSpeech dataset, our experimental results have successfully achieved compression rates greater than 60% without significant degradation in the performance compared to the larger teacher model.

View on arXiv PDF

Similar