CLAILGFeb 23, 2024

Getting Serious about Humor: Crafting Humor Datasets with Unfunny Large Language Models

arXiv:2403.00794v232 citationsh-index: 7Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Originality Incremental advance
AI Analysis

This work addresses the scarcity of humor datasets for NLP researchers, though it is incremental as it builds on existing LLM capabilities for data generation.

The paper tackled the problem of humor detection in natural language processing by generating synthetic datasets with large language models, showing that LLMs can effectively 'unfun' jokes and create challenging adversarial examples for classifiers, with GPT-4's synthetic data rated highly by bilingual annotators.

Humor is a fundamental facet of human cognition and interaction. Yet, despite recent advances in natural language processing, humor detection remains a challenging task that is complicated by the scarcity of datasets that pair humorous texts with similar non-humorous counterparts. In our work, we investigate whether large language models (LLMs), can generate synthetic data for humor detection via editing texts. We benchmark LLMs on an existing human dataset and show that current LLMs display an impressive ability to 'unfun' jokes, as judged by humans and as measured on the downstream task of humor detection. We extend our approach to a code-mixed English-Hindi humor dataset, where we find that GPT-4's synthetic data is highly rated by bilingual annotators and provides challenging adversarial examples for humor classifiers.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes