CLNov 9, 2020

Efficient End-to-End Speech Recognition Using Performers in Conformers

arXiv:2011.04196v20.73 citations

Originality Incremental advance

AI Analysis

This work addresses efficiency challenges for on-device speech recognition, though it is incremental as it builds on existing conformer and performer architectures.

The paper tackled the problem of on-device end-to-end speech recognition by reducing model complexity and size, achieving competitive performance on LibriSpeech with 10 million parameters and a 20% relative improvement in word error rate over previous lightweight models.

On-device end-to-end speech recognition poses a high requirement on model efficiency. Most prior works improve the efficiency by reducing model sizes. We propose to reduce the complexity of model architectures in addition to model sizes. More specifically, we reduce the floating-point operations in conformer by replacing the transformer module with a performer. The proposed attention-based efficient end-to-end speech recognition model yields competitive performance on the LibriSpeech corpus with 10 millions of parameters and linear computation complexity. The proposed model also outperforms previous lightweight end-to-end models by about 20% relatively in word error rate.

View on arXiv PDF

Similar