Beyond Translation: Cross-Cultural Meme Transcreation with Vision-Language Models
This work addresses the challenge of adapting culturally specific memes for cross-cultural communication, which is an incremental advancement in multimodal generation for online content.
The paper tackles the problem of cross-cultural meme transcreation, a multimodal generation task, by proposing a hybrid framework using vision-language models and evaluating it on a large-scale dataset of Chinese and US memes. The results show that current models perform transcreation to a limited extent, with US-Chinese transcreation achieving higher quality than Chinese-US, as assessed through human judgments and automated evaluation on 6,315 meme pairs.
Memes are a pervasive form of online communication, yet their cultural specificity poses significant challenges for cross-cultural adaptation. We study cross-cultural meme transcreation, a multimodal generation task that aims to preserve communicative intent and humor while adapting culture-specific references. We propose a hybrid transcreation framework based on vision-language models and introduce a large-scale bidirectional dataset of Chinese and US memes. Using both human judgments and automated evaluation, we analyze 6,315 meme pairs and assess transcreation quality across cultural directions. Our results show that current vision-language models can perform cross-cultural meme transcreation to a limited extent, but exhibit clear directional asymmetries: US-Chinese transcreation consistently achieves higher quality than Chinese-US. We further identify which aspects of humor and visual-textual design transfer across cultures and which remain challenging, and propose an evaluation framework for assessing cross-cultural multimodal generation. Our code and dataset are publicly available at https://github.com/AIM-SCU/MemeXGen.