ROAICLLGJun 1, 2025

Enhancing Speech Instruction Understanding and Disambiguation in Robotics via Speech Prosody

arXiv:2506.02057v1h-index: 4INTERSPEECH
Originality Incremental advance
AI Analysis

This addresses the challenge of human-robot communication by improving disambiguation of ambiguous speech instructions, though it is incremental as it builds on existing methods with a novel integration.

The paper tackles the problem of robots accurately interpreting spoken language instructions by leveraging speech prosody to infer intent, achieving 95.79% accuracy in detecting referent intents and 71.96% accuracy in determining task plans for ambiguous instructions.

Enabling robots to accurately interpret and execute spoken language instructions is essential for effective human-robot collaboration. Traditional methods rely on speech recognition to transcribe speech into text, often discarding crucial prosodic cues needed for disambiguating intent. We propose a novel approach that directly leverages speech prosody to infer and resolve instruction intent. Predicted intents are integrated into large language models via in-context learning to disambiguate and select appropriate task plans. Additionally, we present the first ambiguous speech dataset for robotics, designed to advance research in speech disambiguation. Our method achieves 95.79% accuracy in detecting referent intents within an utterance and determines the intended task plan of ambiguous instructions with 71.96% accuracy, demonstrating its potential to significantly improve human-robot communication.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes