HCMar 15

Tap-to-Adapt: Learning User-Aligned Response Timing for Speech Agents

arXiv:2603.1444950.6h-index: 3
Predicted impact top 34% in HC · last 90 daysOriginality Incremental advance
AI Analysis

This addresses response timing for interactive speech agents, but it appears incremental as it builds on prior turn modeling and wake-up work.

The paper tackles the problem of aligning speech agent response timing with user intent by proposing the Tap-to-Adapt framework, which uses tap interactions for online learning, and reports results from data-driven experiments and user studies involving 20 participants and 20,000 samples.

Response timing judgment is a critical component of interactive speech agents. Although there exists substantial prior work on turn modeling and voice wake-up, there is a lack of research on response timing judgments continuously aligned with user intent. To address this, we propose the Tap-to-Adapt framework, which enables users to naturally activate or interrupt the agent via tap interactions to construct online learning labels for response timing models. Under this framework, Dilated TCN and a sequential replay strategy play significant roles, as demonstrated through data-driven experiments and user studies. Additionally, we develop an evaluation and continuous data mining system tailored for the Tap-to-Adapt framework, through which we have collected approximately 20,000 samples from the user studies involving 20 participants.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes