CL AIDec 13, 2021

Attentive Contextual Carryover for Multi-Turn End-to-End Spoken Language Understanding

Kai Wei, Thanh Tran, Feng-Ju Chang, Kanthashree Mysore Sathyendra, Thejaswi Muniyappa, Jing Liu, Anirudh Raju, Ross McGowan, Nathan Susanj, Ariya Rastrow, Grant P. Strimel

arXiv:2112.06743v11.010 citationsh-index: 18

Originality Incremental advance

AI Analysis

This addresses the challenge of improving accuracy in task-oriented voice assistants by leveraging contextual signals, representing an incremental advance over existing methods.

The paper tackled the problem of incorporating dialogue history into end-to-end spoken language understanding for multi-turn dialogues, achieving reductions of 10.8% in word error rate and 12.6% in semantic error rate on a voice assistant dataset.

Recent years have seen significant advances in end-to-end (E2E) spoken language understanding (SLU) systems, which directly predict intents and slots from spoken audio. While dialogue history has been exploited to improve conventional text-based natural language understanding systems, current E2E SLU approaches have not yet incorporated such critical contextual signals in multi-turn and task-oriented dialogues. In this work, we propose a contextual E2E SLU model architecture that uses a multi-head attention mechanism over encoded previous utterances and dialogue acts (actions taken by the voice assistant) of a multi-turn dialogue. We detail alternative methods to integrate these contexts into the state-ofthe-art recurrent and transformer-based models. When applied to a large de-identified dataset of utterances collected by a voice assistant, our method reduces average word and semantic error rates by 10.8% and 12.6%, respectively. We also present results on a publicly available dataset and show that our method significantly improves performance over a noncontextual baseline

View on arXiv PDF

Similar