CL AIJun 18, 2024

Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba

Ruiqi He, Yushu He, Longju Bai, Jiarui Liu, Zhenjie Sun, Zenghao Tang, He Wang, Hanchen Xia, Naihao Deng

arXiv:2406.12754v14 citations

Originality Synthesis-oriented

AI Analysis

This addresses the problem of limited humor understanding resources for Chinese speakers, though it is incremental as it extends existing dataset creation methods to a new language and cultural context.

The authors tackled the lack of culturally nuanced humor datasets for Chinese by constructing Chumor from Ruo Zhi Ba, and found that human explanations for jokes significantly outperform those from GPT-4o and ERNIE Bot, with the dataset being challenging for state-of-the-art LLMs.

Existing humor datasets and evaluations predominantly focus on English, lacking resources for culturally nuanced humor in non-English languages like Chinese. To address this gap, we construct Chumor, a dataset sourced from Ruo Zhi Ba (RZB), a Chinese Reddit-like platform dedicated to sharing intellectually challenging and culturally specific jokes. We annotate explanations for each joke and evaluate human explanations against two state-of-the-art LLMs, GPT-4o and ERNIE Bot, through A/B testing by native Chinese speakers. Our evaluation shows that Chumor is challenging even for SOTA LLMs, and the human explanations for Chumor jokes are significantly better than explanations generated by the LLMs.

View on arXiv PDF

Similar