Improving End-of-turn Detection in Spoken Dialogues by Detecting Speaker Intentions as a Secondary Task
This work addresses the challenge of end-of-turn detection for automated dialogue systems, but it is incremental as it builds on known influences of speaker intentions on turn-taking.
The paper tackles the problem of predicting turn-transitions in spoken dialogues by proposing a multi-task neural approach that simultaneously predicts speaker intentions as an auxiliary task, resulting in improved performance without needing extra features at run-time.
This work focuses on the use of acoustic cues for modeling turn-taking in dyadic spoken dialogues. Previous work has shown that speaker intentions (e.g., asking a question, uttering a backchannel, etc.) can influence turn-taking behavior and are good predictors of turn-transitions in spoken dialogues. However, speaker intentions are not readily available for use by automated systems at run-time; making it difficult to use this information to anticipate a turn-transition. To this end, we propose a multi-task neural approach for predicting turn- transitions and speaker intentions simultaneously. Our results show that adding the auxiliary task of speaker intention prediction improves the performance of turn-transition prediction in spoken dialogues, without relying on additional input features during run-time.