CLMar 10, 2025

KwaiChat: A Large-Scale Video-Driven Multilingual Mixed-Type Dialogue Corpus

arXiv:2503.06899v212 citationsh-index: 32NAACL
Originality Incremental advance
AI Analysis

This addresses a problem for developers of video-based dialogue systems by providing a versatile dataset to improve versatility across scenarios like question-answering and emotional dialog, though it is incremental as it builds on existing video-dialogue research.

The paper tackles the limitation of video-based dialogue systems being restricted to a single dialogue type by introducing KwaiChat, a large-scale corpus of 93,209 videos and 246,080 dialogues across 4 dialogue types, 30 domains, 4 languages, and 13 topics, with baseline models showing that even GPT-4o performs poorly despite in-context learning and fine-tuning.

Video-based dialogue systems, such as education assistants, have compelling application value, thereby garnering growing interest. However, the current video-based dialogue systems are limited by their reliance on a single dialogue type, which hinders their versatility in practical applications across a range of scenarios, including question-answering, emotional dialog, etc. In this paper, we identify this challenge as how to generate video-driven multilingual mixed-type dialogues. To mitigate this challenge, we propose a novel task and create a human-to-human video-driven multilingual mixed-type dialogue corpus, termed KwaiChat, containing a total of 93,209 videos and 246,080 dialogues, across 4 dialogue types, 30 domains, 4 languages, and 13 topics. Additionally, we establish baseline models on KwaiChat. An extensive analysis of 7 distinct LLMs on KwaiChat reveals that GPT-4o achieves the best performance but still cannot perform well in this situation even with the help of in-context learning and fine-tuning, which indicates that the task is not trivial and needs further research.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes