CLOct 21, 2024

Large Language Models Know What To Say But Not When To Speak

Muhammad Umair, Vasanth Sarathy, JP de Ruiter

arXiv:2410.16044v115.731 citationsh-index: 19EMNLP

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of making Spoken Dialogue Systems more natural and coherent for users, though it is incremental as it focuses on evaluating existing models on a new dataset rather than proposing a new method.

The paper tackled the problem of Large Language Models (LLMs) struggling to predict speaking opportunities, specifically within-turn Transition Relevance Places (TRPs), in unscripted conversations, and found that current LLMs have limitations in modeling these interactions, highlighting areas for improvement.

Turn-taking is a fundamental mechanism in human communication that ensures smooth and coherent verbal interactions. Recent advances in Large Language Models (LLMs) have motivated their use in improving the turn-taking capabilities of Spoken Dialogue Systems (SDS), such as their ability to respond at appropriate times. However, existing models often struggle to predict opportunities for speaking -- called Transition Relevance Places (TRPs) -- in natural, unscripted conversations, focusing only on turn-final TRPs and not within-turn TRPs. To address these limitations, we introduce a novel dataset of participant-labeled within-turn TRPs and use it to evaluate the performance of state-of-the-art LLMs in predicting opportunities for speaking. Our experiments reveal the current limitations of LLMs in modeling unscripted spoken interactions, highlighting areas for improvement and paving the way for more naturalistic dialogue systems.

View on arXiv PDF

Similar