ROCLSDMar 8, 2025

A Noise-Robust Turn-Taking System for Real-World Dialogue Robots: A Field Experiment

arXiv:2503.06241v23 citationsh-index: 19IROS
Originality Incremental advance
AI Analysis

This addresses the challenge of real-world noise for human-robot interaction, though it appears incremental as it builds on existing turn-taking models.

The study tackled the problem of turn-taking robustness in real-world dialogue robots by proposing a noise-robust voice activity projection model, which significantly reduced response latency in a field experiment, leading to faster and more natural conversations.

Turn-taking is a crucial aspect of human-robot interaction, directly influencing conversational fluidity and user engagement. While previous research has explored turn-taking models in controlled environments, their robustness in real-world settings remains underexplored. In this study, we propose a noise-robust voice activity projection (VAP) model, based on a Transformer architecture, to enhance real-time turn-taking in dialogue robots. To evaluate the effectiveness of the proposed system, we conducted a field experiment in a shopping mall, comparing the VAP system with a conventional cloud-based speech recognition system. Our analysis covered both subjective user evaluations and objective behavioral analysis. The results showed that the proposed system significantly reduced response latency, leading to a more natural conversation where both the robot and users responded faster. The subjective evaluations suggested that faster responses contribute to a better interaction experience.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes