AS CL SDJul 13, 2018

Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units

Zhangyu Xiao, Zhijian Ou, Wei Chu, Hui Lin

arXiv:1807.04978v210.841 citations

Originality Incremental advance

AI Analysis

This work addresses speech recognition accuracy for applications requiring robust handling of diverse vocabularies, representing an incremental improvement over existing hybrid methods.

The paper tackled the problem of out-of-vocabulary issues in end-to-end speech recognition by using subword units in a hybrid CTC-Attention system, achieving a 6.8% word error rate on the LibriSpeech test_clean subset, which is a 12.8% relative reduction compared to character-based systems.

In this paper, we present an end-to-end automatic speech recognition system, which successfully employs subword units in a hybrid CTC-Attention based system. The subword units are obtained by the byte-pair encoding (BPE) compression algorithm. Compared to using words as modeling units, using characters or subword units does not suffer from the out-of-vocabulary (OOV) problem. Furthermore, using subword units further offers a capability in modeling longer context than using characters. We evaluate different systems over the LibriSpeech 1000h dataset. The subword-based hybrid CTC-Attention system obtains 6.8% word error rate (WER) on the test_clean subset without any dictionary or external language model. This represents a significant improvement (a 12.8% WER relative reduction) over the character-based hybrid CTC-Attention system.

View on arXiv PDF

Similar