SD CL ASJul 2, 2018

Exploring End-to-End Techniques for Low-Resource Speech Recognition

Vladimir Bataev, Maxim Korenevsky, Ivan Medennikov, Alexander Zatvornitskiy

arXiv:1807.00868v17.15 citations

Originality Synthesis-oriented

AI Analysis

This work addresses speech recognition for low-resource languages like Turkish, but it is incremental as it builds on existing end-to-end techniques with minor modifications.

The paper tackled low-resource speech recognition for Turkish spontaneous speech using an 80-hour dataset, achieving a word error rate of 45.8%, which is the best reported result for end-to-end systems on this task.

In this work we present simple grapheme-based system for low-resource speech recognition using Babel data for Turkish spontaneous speech (80 hours). We have investigated different neural network architectures performance, including fully-convolutional, recurrent and ResNet with GRU. Different features and normalization techniques are compared as well. We also proposed CTC-loss modification using segmentation during training, which leads to improvement while decoding with small beam size. Our best model achieved word error rate of 45.8%, which is the best reported result for end-to-end systems using in-domain data for this task, according to our knowledge.

View on arXiv PDF

Similar