CLNov 9, 2020

Efficient End-to-End Speech Recognition Using Performers in Conformers

arXiv:2011.04196v23 citations
AI Analysis

This work addresses efficiency challenges for on-device speech recognition, though it is incremental as it builds on existing conformer and performer architectures.

The paper tackled the problem of on-device end-to-end speech recognition by reducing model complexity and size, achieving competitive performance on LibriSpeech with 10 million parameters and a 20% relative improvement in word error rate over previous lightweight models.

On-device end-to-end speech recognition poses a high requirement on model efficiency. Most prior works improve the efficiency by reducing model sizes. We propose to reduce the complexity of model architectures in addition to model sizes. More specifically, we reduce the floating-point operations in conformer by replacing the transformer module with a performer. The proposed attention-based efficient end-to-end speech recognition model yields competitive performance on the LibriSpeech corpus with 10 millions of parameters and linear computation complexity. The proposed model also outperforms previous lightweight end-to-end models by about 20% relatively in word error rate.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes