HCMar 23

RESPOND: Responsive Engagement Strategy for Predictive Orchestration and Dialogue

arXiv:2603.2168273.1h-index: 31
AI Analysis

This work addresses the need for more natural and engaging voice interfaces for users of conversational agents, though it is incremental as it builds on existing streaming ASR and incremental semantics.

The authors tackled the problem of stiff, robotic turn-taking in voice-based conversational agents by introducing RESPOND, a framework that enables timely backchannels and proactive turn claims, resulting in more fluid and listener-aware dialogue.

The majority of voice-based conversational agents still rely on pause-and-respond turn-taking, leaving interactions sounding stiff and robotic. We present RESPOND (Responsive Engagement Strategy for Predictive Orchestration and Dialogue), a framework that brings two staples of human conversation to agents: timely backchannels ("mm-hmm," "right") and proactive turn claims that can contribute relevant content before the speaker yields the conversational floor. Built on streaming ASR (Automatic Speech Recognition) and incremental semantics, RESPOND continuously predicts both when and how to interject, enabling fluid, listener-aware dialogue. A defining feature is its designer-facing controllability: two orthogonal dials, Backchannel Intensity (frequency of acknowledgments) and Turn Claim Aggressiveness (depth and assertiveness of early contributions), can be tuned to match the etiquette of contexts ranging from rapid ideation to reflective counseling. By coupling predictive orchestration with explicit control, RESPOND offers a practical path toward conversational agents that adapt their conversational footprint to social expectations, advancing the design of more natural and engaging voice interfaces.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes