"President Vows to Cut <Taxes> Hair": Dataset and Analysis of Creative Text Editing for Humorous Headlines
This work addresses the problem of computational humor for researchers and developers by providing a curated dataset and initial models, though it is incremental as it builds on existing humor theories and methods.
The authors introduced and analyzed the Humicroedit dataset, consisting of 15,095 English news headlines edited for humor through simple word replacements, and developed baseline classifiers to predict humor as a step toward automatic humorous headline generation.
We introduce, release, and analyze a new dataset, called Humicroedit, for research in computational humor. Our publicly available data consists of regular English news headlines paired with versions of the same headlines that contain simple replacement edits designed to make them funny. We carefully curated crowdsourced editors to create funny headlines and judges to score a to a total of 15,095 edited headlines, with five judges per headline. The simple edits, usually just a single word replacement, mean we can apply straightforward analysis techniques to determine what makes our edited headlines humorous. We show how the data support classic theories of humor, such as incongruity, superiority, and setup/punchline. Finally, we develop baseline classifiers that can predict whether or not an edited headline is funny, which is a first step toward automatically generating humorous headlines as an approach to creating topical humor.