CLAug 7, 2018

Dialog-context aware end-to-end speech recognition

arXiv:1808.02171v15.148 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of processing conversational speech for applications like voice assistants, though it is incremental as it builds on existing end-to-end methods.

The paper tackles the problem of improving speech recognition accuracy in long conversations by integrating dialog context into end-to-end models, showing that their system outperforms a sentence-level baseline on the Switchboard corpus.

Existing speech recognition systems are typically built at the sentence level, although it is known that dialog context, e.g. higher-level knowledge that spans across sentences or speakers, can help the processing of long conversations. The recent progress in end-to-end speech recognition systems promises to integrate all available information (e.g. acoustic, language resources) into a single model, which is then jointly optimized. It seems natural that such dialog context information should thus also be integrated into the end-to-end models to improve further recognition accuracy. In this work, we present a dialog-context aware speech recognition model, which explicitly uses context information beyond sentence-level information, in an end-to-end fashion. Our dialog-context model captures a history of sentence-level context so that the whole system can be trained with dialog-context information in an end-to-end manner. We evaluate our proposed approach on the Switchboard conversational speech corpus and show that our system outperforms a comparable sentence-level end-to-end speech recognition system.

View on arXiv PDF

Similar