"Mm, Wat?" Detecting Other-initiated Repair Requests in Dialogue
This addresses a specific issue for conversational agents in maintaining mutual understanding, but it is incremental as it builds on existing methods with multimodal features.
The paper tackled the problem of conversational agents failing to detect user repair initiation in dialogues, proposing a multimodal model that integrates linguistic and prosodic features for Dutch dialogues, with results showing prosodic cues significantly improve detection performance.
Maintaining mutual understanding is a key component in human-human conversation to avoid conversation breakdowns, in which repair, particularly Other-Initiated Repair (OIR, when one speaker signals trouble and prompts the other to resolve), plays a vital role. However, Conversational Agents (CAs) still fail to recognize user repair initiation, leading to breakdowns or disengagement. This work proposes a multimodal model to automatically detect repair initiation in Dutch dialogues by integrating linguistic and prosodic features grounded in Conversation Analysis. The results show that prosodic cues complement linguistic features and significantly improve the results of pretrained text and audio embeddings, offering insights into how different features interact. Future directions include incorporating visual cues, exploring multilingual and cross-context corpora to assess the robustness and generalizability.