CLROJan 15, 2025

Applying General Turn-taking Models to Conversational Human-Robot Interaction

arXiv:2501.08946v117 citationsh-index: 13HRI
Originality Incremental advance
AI Analysis

This addresses the problem of more natural and efficient conversations in human-robot interaction, though it is incremental as it adapts existing models to a new domain.

This paper tackled the problem of unnatural turn-taking in human-robot interaction by applying general turn-taking models like TurnGPT and Voice Activity Projection to improve conversational dynamics, resulting in participants significantly preferring the system and reduced response delays and interruptions in a study with 39 adults.

Turn-taking is a fundamental aspect of conversation, but current Human-Robot Interaction (HRI) systems often rely on simplistic, silence-based models, leading to unnatural pauses and interruptions. This paper investigates, for the first time, the application of general turn-taking models, specifically TurnGPT and Voice Activity Projection (VAP), to improve conversational dynamics in HRI. These models are trained on human-human dialogue data using self-supervised learning objectives, without requiring domain-specific fine-tuning. We propose methods for using these models in tandem to predict when a robot should begin preparing responses, take turns, and handle potential interruptions. We evaluated the proposed system in a within-subject study against a traditional baseline system, using the Furhat robot with 39 adults in a conversational setting, in combination with a large language model for autonomous response generation. The results show that participants significantly prefer the proposed system, and it significantly reduces response delays and interruptions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes