Detecting agreement in multi-party dialogue: evaluating speaker diarisation versus a procedural baseline to enhance user engagement
This addresses the challenge of improving conversational agents' ability to track dialogue states for better user engagement in multi-party interactions, but it is incremental as it compares existing methods.
The study tackled the problem of detecting agreement in multi-party dialogue by comparing a diarisation model to a procedural baseline, finding that the procedural system was more accurate (0.44 vs. 0.28) and led to higher player engagement.
Conversational agents participating in multi-party interactions face significant challenges in dialogue state tracking, since the identity of the speaker adds significant contextual meaning. It is common to utilise diarisation models to identify the speaker. However, it is not clear if these are accurate enough to correctly identify specific conversational events such as agreement or disagreement during a real-time interaction. This study uses a cooperative quiz, where the conversational agent acts as quiz-show host, to determine whether diarisation or a frequency-and-proximity-based method is more accurate at determining agreement, and whether this translates to feelings of engagement from the players. Experimental results show that our procedural system was more engaging to players, and was more accurate at detecting agreement, reaching an average accuracy of 0.44 compared to 0.28 for the diarised system.