CLMar 3, 2025

KoWit-24: A Richly Annotated Dataset of Wordplay in News Headlines

arXiv:2503.01510v21 citationsh-index: 1Has CodeRANLP
Originality Synthesis-oriented
AI Analysis

This work addresses the need for better wordplay understanding in natural language processing, particularly for Russian, by providing a novel dataset that includes context and underrepresented wordplay types, though it is incremental as it builds on existing humor research.

The authors tackled the problem of wordplay detection and interpretation by introducing KoWit-24, a richly annotated dataset of 2,700 Russian news headlines with fine-grained annotations, and experiments with five LLMs showed significant room for improvement in these tasks.

We present KoWit-24, a dataset with fine-grained annotation of wordplay in 2,700 Russian news headlines. KoWit-24 annotations include the presence of wordplay, its type, wordplay anchors, and words/phrases the wordplay refers to. Unlike the majority of existing humor collections of canned jokes, KoWit-24 provides wordplay contexts -- each headline is accompanied by the news lead and summary. The most common type of wordplay in the dataset is the transformation of collocations, idioms, and named entities -- the mechanism that has been underrepresented in previous humor datasets. Our experiments with five LLMs show that there is ample room for improvement in wordplay detection and interpretation tasks. The dataset and evaluation scripts are available at https://github.com/Humor-Research/KoWit-24

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes