CLAIHCOct 17, 2023

EXMODD: An EXplanatory Multimodal Open-Domain Dialogue dataset

arXiv:2310.10967v1h-index: 9Has Code
Originality Incremental advance
AI Analysis

This addresses data collection challenges for researchers in multimodal dialogue systems, though it appears incremental as it builds on existing large model methods.

The authors tackled the problem of high-quality data scarcity for open-domain multimodal dialogue tasks by proposing a Multimodal Data Construction Framework (MDCF) that generates well-formed content with explanations, resulting in the release of the EXMODD dataset and showing a positive correlation between accurate understanding and response quality.

The need for high-quality data has been a key issue hindering the research of dialogue tasks. Recent studies try to build datasets through manual, web crawling, and large pre-trained models. However, man-made data is expensive and data collected from the internet often includes generic responses, meaningless statements, and toxic dialogues. Automatic data generation through large models is a cost-effective method, but for open-domain multimodal dialogue tasks, there are still three drawbacks: 1) There is currently no open-source large model that can accept multimodal input; 2) The content generated by the model lacks interpretability; 3) The generated data is usually difficult to quality control and require extensive resource to collect. To alleviate the significant human and resource expenditure in data collection, we propose a Multimodal Data Construction Framework (MDCF). MDCF designs proper prompts to spur the large-scale pre-trained language model to generate well-formed and satisfactory content. Additionally, MDCF also automatically provides explanation for a given image and its corresponding dialogue, which can provide a certain degree of interpretability and facilitate manual follow-up quality inspection. Based on this, we release an Explanatory Multimodal Open-Domain dialogue dataset (EXMODD). Experiments indicate a positive correlation between the model's ability to generate accurate understandings and high-quality responses. Our code and data can be found at https://github.com/poplpr/EXMODD.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes