LGAIMay 13, 2021

Exploring CTC Based End-to-End Techniques for Myanmar Speech Recognition

arXiv:2105.06253v24 citations
Originality Synthesis-oriented
AI Analysis

This work addresses low-resource speech recognition for the Myanmar language, but it is incremental as it applies existing methods to a new dataset.

The paper tackled speech recognition for the Myanmar language by exploring CTC-based end-to-end models, achieving a character error rate of 4.72% and syllable error rate of 12.38% on a test set using a 26-hour corpus.

In this work, we explore a Connectionist Temporal Classification (CTC) based end-to-end Automatic Speech Recognition (ASR) model for the Myanmar language. A series of experiments is presented on the topology of the model in which the convolutional layers are added and dropped, different depths of bidirectional long short-term memory (BLSTM) layers are used and different label encoding methods are investigated. The experiments are carried out in low-resource scenarios using our recorded Myanmar speech corpus of nearly 26 hours. The best model achieves character error rate (CER) of 4.72% and syllable error rate (SER) of 12.38% on the test set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes