HCSep 30, 2017

Confirmation detection in human-agent interaction using non-lexical speech cues

Mara Brandt, Britta Wrede, Franz Kummert, Lars Schillingmann

arXiv:1710.00171v13.21 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for more accurate intent interpretation in spoken dialog systems, particularly for assistance systems involving elderly and cognitively impaired users, but it is incremental as it applies an existing method (SVM) to a specific domain.

The paper tackled the problem of detecting non-lexical confirmations like 'mhm' in human-agent interactions to improve intent interpretation in spoken dialog systems, achieving an accuracy of 84% using stacked formants as features.

Even if only the acoustic channel is considered, human communication is highly multi-modal. Non-lexical cues provide a variety of information such as emotion or agreement. The ability to process such cues is highly relevant for spoken dialog systems, especially in assistance systems. In this paper we focus on the recognition of non-lexical confirmations such as "mhm", as they enhance the system's ability to accurately interpret human intent in natural communication. The architecture uses a Support Vector Machine to detect confirmations based on acoustic features. In a systematic comparison, several feature sets were evaluated for their performance on a corpus of human-agent interaction in a setting with naive users including elderly and cognitively impaired people. Our results show that using stacked formants as features yield an accuracy of 84% outperforming regular formants and MFCC or pitch based features for online classification.

View on arXiv PDF

Similar