CLLGSDASDec 3, 2020

End to End ASR System with Automatic Punctuation Insertion

arXiv:2012.02012v16 citations
Originality Incremental advance
AI Analysis

This work is significant for improving the readability and usability of ASR outputs for general users by automatically inserting punctuation, which is an incremental improvement to existing end-to-end ASR systems.

This paper addresses the lack of punctuation in end-to-end Automatic Speech Recognition (ASR) systems by proposing a method to generate punctuated transcripts for the TEDLIUM dataset and an end-to-end ASR system that concurrently outputs words and punctuation. The proposed model significantly reduces the slot error rate from 0.497 to 0.341 compared to previous methods.

Recent Automatic Speech Recognition systems have been moving towards end-to-end systems that can be trained together. Numerous techniques that have been proposed recently enabled this trend, including feature extraction with CNNs, context capturing and acoustic feature modeling with RNNs, automatic alignment of input and output sequences using Connectionist Temporal Classifications, as well as replacing traditional n-gram language models with RNN Language Models. Historically, there has been a lot of interest in automatic punctuation in textual or speech to text context. However, there seems to be little interest in incorporating automatic punctuation into the emerging neural network based end-to-end speech recognition systems, partially due to the lack of English speech corpus with punctuated transcripts. In this study, we propose a method to generate punctuated transcript for the TEDLIUM dataset using transcripts available from ted.com. We also propose an end-to-end ASR system that outputs words and punctuations concurrently from speech signals. Combining Damerau Levenshtein Distance and slot error rate into DLev-SER, we enable measurement of punctuation error rate when the hypothesis text is not perfectly aligned with the reference. Compared with previous methods, our model reduces slot error rate from 0.497 to 0.341.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes