CLAIJan 14, 2023

TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World

arXiv:2301.05880v313 citationsh-index: 70
Originality Synthesis-oriented
AI Analysis

This dataset addresses the problem of developing more human-like multi-modal chatbots for AI researchers, though it is incremental as it builds on existing multi-modal dialogue datasets.

The authors introduced TikTalk, a video-based multi-modal dialogue dataset with 38K videos and 367K conversations, to study chatbots with real-world chitchat context, finding that models using knowledge graphs perform best but still face challenges in capturing human interests and integrating external knowledge.

To facilitate the research on intelligent and human-like chatbots with multi-modal context, we introduce a new video-based multi-modal dialogue dataset, called TikTalk. We collect 38K videos from a popular video-sharing platform, along with 367K conversations posted by users beneath them. Users engage in spontaneous conversations based on their multi-modal experiences from watching videos, which helps recreate real-world chitchat context. Compared to previous multi-modal dialogue datasets, the richer context types in TikTalk lead to more diverse conversations, but also increase the difficulty in capturing human interests from intricate multi-modal information to generate personalized responses. Moreover, external knowledge is more frequently evoked in our dataset. These facts reveal new challenges for multi-modal dialogue models. We quantitatively demonstrate the characteristics of TikTalk, propose a video-based multi-modal chitchat task, and evaluate several dialogue baselines. Experimental results indicate that the models incorporating large language models (LLM) can generate more diverse responses, while the model utilizing knowledge graphs to introduce external knowledge performs the best overall. Furthermore, no existing model can solve all the above challenges well. There is still a large room for future improvements, even for LLM with visual extensions. Our dataset is available at \url{https://ruc-aimind.github.io/projects/TikTalk/}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes