CLAIJul 1, 2025

MemeCMD: An Automatically Generated Chinese Multi-turn Dialogue Dataset with Contextually Retrieved Memes

arXiv:2507.00891v1
Originality Synthesis-oriented
AI Analysis

This provides a scalable and privacy-preserving resource for advancing multimodal conversational AI, addressing a domain-specific need for researchers in that field.

The paper tackles the lack of multimodal dialogue datasets by introducing MemeCMD, an automatically generated Chinese multi-turn dialogue dataset with contextually retrieved memes, which combines a large-scale MLLM-annotated meme library with auto-generated dialogues across diverse scenarios.

Memes are widely used in online social interactions, providing vivid, intuitive, and often humorous means to express intentions and emotions. Existing dialogue datasets are predominantly limited to either manually annotated or pure-text conversations, lacking the expressiveness and contextual nuance that multimodal interactions provide.To address these challenges, we introduce MemeCMD, an automatically generated Chinese Multi-turn Dialogue dataset with contextually retrieved memes. Our dataset combines a large-scale, MLLM-annotated meme library with dialogues auto-generated by dual agents across diverse scenarios. We introduce a retrieval framework and adaptive threshold to ensure contextually relevant, naturally spaced meme usage. Experiments demonstrate the effectiveness of our approach in generating contextually appropriate and diverse meme-incorporated dialogues, offering a scalable and privacy-preserving resource for advancing multimodal conversational AI.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes