AS IR LG SDDec 11, 2019

Leveraging End-to-End Speech Recognition with Neural Architecture Search

Ahmed Baruwa, Mojeed Abisiga, Ibrahim Gbadegesin, Afeez Fakunle

arXiv:1912.05946v216 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of high computational cost in speech recognition for researchers and practitioners by offering a faster method with competitive accuracy.

The paper tackled improving deep speech models for automatic speech recognition by using neural architecture search to optimize models efficiently, achieving a 7% word error rate on LibriSpeech and 13% phone error rate on TIMIT, matching state-of-the-art results.

Deep neural networks (DNNs) have been demonstrated to outperform many traditional machine learning algorithms in Automatic Speech Recognition (ASR). In this paper, we show that a large improvement in the accuracy of deep speech models can be achieved with effective Neural Architecture Optimization at a very low computational cost. Phone recognition tests with the popular LibriSpeech and TIMIT benchmarks proved this fact by displaying the ability to discover and train novel candidate models within a few hours (less than a day) many times faster than the attention-based seq2seq models. Our method achieves test error of 7% Word Error Rate (WER) on the LibriSpeech corpus and 13% Phone Error Rate (PER) on the TIMIT corpus, on par with state-of-the-art results.

View on arXiv PDF

Similar