HCSep 30, 2017

Confirmation detection in human-agent interaction using non-lexical speech cues

arXiv:1710.00171v11 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for more accurate intent interpretation in spoken dialog systems, particularly for assistance systems involving elderly and cognitively impaired users, but it is incremental as it applies an existing method (SVM) to a specific domain.

The paper tackled the problem of detecting non-lexical confirmations like 'mhm' in human-agent interactions to improve intent interpretation in spoken dialog systems, achieving an accuracy of 84% using stacked formants as features.

Even if only the acoustic channel is considered, human communication is highly multi-modal. Non-lexical cues provide a variety of information such as emotion or agreement. The ability to process such cues is highly relevant for spoken dialog systems, especially in assistance systems. In this paper we focus on the recognition of non-lexical confirmations such as "mhm", as they enhance the system's ability to accurately interpret human intent in natural communication. The architecture uses a Support Vector Machine to detect confirmations based on acoustic features. In a systematic comparison, several feature sets were evaluated for their performance on a corpus of human-agent interaction in a setting with naive users including elderly and cognitively impaired people. Our results show that using stacked formants as features yield an accuracy of 84% outperforming regular formants and MFCC or pitch based features for online classification.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes