SD ASOct 25, 2019

Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech Recognition

Jisung Wang, Jihwan Kim, Sangki Kim, Yeha Lee

arXiv:1910.11590v15.75 citations

Originality Incremental advance

AI Analysis

This work addresses speech recognition for Korean and code-switching, which is incremental as it explores unit choices within an existing framework.

The paper tackled the problem of selecting acoustic modeling units for end-to-end Korean and Korean-English code-switching speech recognition, finding that sequence-to-sequence learning with sub-words based on Korean syllables performed best on datasets of 51.6 hours and 2530 hours without needing a lexicon or language model.

As the character-based end-to-end automatic speech recognition (ASR) models evolve, the choice of acoustic modeling units becomes important. Since Korean is a fairly phonetic language and has a unique writing system with its own Korean alphabet, it's worth investigating modeling units for an end-to-end Korean ASR task. In this work, we introduce lexicon-free modeling units in Korean, and explore them using a hybrid CTC/Attention-based encoder-decoder model. Five lexicon-free units are investigated: Syllable-based Korean character (with English character for a code-switching task), Korean Jamo character (with English character), sub-word on syllable-based character (with sub-word in English), sub-word on Jamo character (with sub-words in English), and finally byte unit, which is a universal one across language. Experiments on Zeroth-Korean (51.6 hrs) and Medical Record (2530 hrs) are done for Korean and Korean-English code-switching ASR tasks, respectively. Sequence-to-sequence learning with sub-words based on Korean syllables (and sub-words in English) performs the best for both tasks without lexicon and an extra language model integration.

View on arXiv PDF

Similar