CL AI SD ASJan 28, 2025

An LLM Benchmark for Addressee Recognition in Multi-modal Multi-party Dialogue

Koji Inoue, Divesh Lala, Mikey Elmers, Keiko Ochi, Tatsuya Kawahara

arXiv:2501.16643v213.07 citationsh-index: 19

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of identifying addressees in multi-party dialogue systems, which is incremental as it builds on existing multi-party interaction tasks.

The paper tackled the problem of addressee recognition in multi-modal multi-party dialogues by constructing a triadic dialogue corpus and benchmarking GPT-4o, which achieved accuracy only marginally above chance, highlighting the task's difficulty.

Handling multi-party dialogues represents a significant step for advancing spoken dialogue systems, necessitating the development of tasks specific to multi-party interactions. To address this challenge, we are constructing a multi-modal multi-party dialogue corpus of triadic (three-participant) discussions. This paper focuses on the task of addressee recognition, identifying who is being addressed to take the next turn, a critical component unique to multi-party dialogue systems. A subset of the corpus was annotated with addressee information, revealing that explicit addressees are indicated in approximately 20% of conversational turns. To evaluate the task's complexity, we benchmarked the performance of a large language model (GPT-4o) on addressee recognition. The results showed that GPT-4o achieved an accuracy only marginally above chance, underscoring the challenges of addressee recognition in multi-party dialogue. These findings highlight the need for further research to enhance the capabilities of large language models in understanding and navigating the intricacies of multi-party conversational dynamics.

View on arXiv PDF

Similar