Advances in All-Neural Speech Recognition
This work addresses speech recognition accuracy for conversational telephony applications, representing an incremental advance with specific technical improvements.
The paper tackles improving CTC-based all-neural speech recognition by proposing a novel symbol inventory and iterated-CTC method, achieving significantly better performance on the NIST 2000 conversational telephony test set compared to previous similar systems.
This paper advances the design of CTC-based all-neural (or end-to-end) speech recognizers. We propose a novel symbol inventory, and a novel iterated-CTC method in which a second system is used to transform a noisy initial output into a cleaner version. We present a number of stabilization and initialization methods we have found useful in training these networks. We evaluate our system on the commonly used NIST 2000 conversational telephony test set, and significantly exceed the previously published performance of similar systems, both with and without the use of an external language model and decoding technology.