AS CL LG SDJun 15, 2020

Exploration of End-to-End ASR for OpenSTT -- Russian Open Speech-to-Text Dataset

Andrei Andrusenko, Aleksandr Laptev, Ivan Medennikov

arXiv:2006.08274v214 citationsHas Code

AI Analysis

This work addresses speech recognition for Russian speakers by comparing methods on a large open dataset, but it is incremental as it applies existing techniques to new data.

The paper tackled automatic speech recognition for Russian using the OpenSTT dataset, evaluating end-to-end models against a hybrid system, with the best end-to-end model achieving word error rates of 34.8%, 19.1%, and 18.1% on phone calls, YouTube, and books validation sets, respectively.

This paper presents an exploration of end-to-end automatic speech recognition systems (ASR) for the largest open-source Russian language data set -- OpenSTT. We evaluate different existing end-to-end approaches such as joint CTC/Attention, RNN-Transducer, and Transformer. All of them are compared with the strong hybrid ASR system based on LF-MMI TDNN-F acoustic model. For the three available validation sets (phone calls, YouTube, and books), our best end-to-end model achieves word error rate (WER) of 34.8%, 19.1%, and 18.1%, respectively. Under the same conditions, the hybridASR system demonstrates 33.5%, 20.9%, and 18.6% WER.

View on arXiv PDF Code

Similar