ASCLSDJul 13, 2018

Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units

arXiv:1807.04978v241 citations
Originality Incremental advance
AI Analysis

This work addresses speech recognition accuracy for applications requiring robust handling of diverse vocabularies, representing an incremental improvement over existing hybrid methods.

The paper tackled the problem of out-of-vocabulary issues in end-to-end speech recognition by using subword units in a hybrid CTC-Attention system, achieving a 6.8% word error rate on the LibriSpeech test_clean subset, which is a 12.8% relative reduction compared to character-based systems.

In this paper, we present an end-to-end automatic speech recognition system, which successfully employs subword units in a hybrid CTC-Attention based system. The subword units are obtained by the byte-pair encoding (BPE) compression algorithm. Compared to using words as modeling units, using characters or subword units does not suffer from the out-of-vocabulary (OOV) problem. Furthermore, using subword units further offers a capability in modeling longer context than using characters. We evaluate different systems over the LibriSpeech 1000h dataset. The subword-based hybrid CTC-Attention system obtains 6.8% word error rate (WER) on the test_clean subset without any dictionary or external language model. This represents a significant improvement (a 12.8% WER relative reduction) over the character-based hybrid CTC-Attention system.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes