CLAIApr 18, 2021

SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations

arXiv:2104.08667v2675 citations
Originality Incremental advance
AI Analysis

This dataset addresses the problem of developing next-generation task-oriented dialog systems for real-world multimodal environments, though it is incremental as it builds upon existing data collection methods.

The authors tackled the lack of multimodal context in task-oriented dialog datasets by introducing SIMMC 2.0, a dataset with 11K dialogs (117K utterances) grounded in immersive scenes, and demonstrated promising baseline results using a state-of-the-art language model.

Next generation task-oriented dialog systems need to understand conversational contexts with their perceived surroundings, to effectively help users in the real-world multimodal environment. Existing task-oriented dialog datasets aimed towards virtual assistance fall short and do not situate the dialog in the user's multimodal context. To overcome, we present a new dataset for Situated and Interactive Multimodal Conversations, SIMMC 2.0, which includes 11K task-oriented user<->assistant dialogs (117K utterances) in the shopping domain, grounded in immersive and photo-realistic scenes. The dialogs are collected using a two-phase pipeline: (1) A novel multimodal dialog simulator generates simulated dialog flows, with an emphasis on diversity and richness of interactions, (2) Manual paraphrasing of the generated utterances to collect diverse referring expressions. We provide an in-depth analysis of the collected dataset, and describe in detail the four main benchmark tasks we propose. Our baseline model, powered by the state-of-the-art language model, shows promising results, and highlights new challenges and directions for the community to study.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes