Dark & Stormy: Modeling Humor in the Worst Sentences Ever Written
This addresses the challenge of computational humor analysis for diverse and intentionally bad humor, which is incremental as it builds on existing humor studies with a new dataset.
The paper tackled the problem of modeling intentionally bad humor by curating and analyzing a novel corpus from the Bulwer-Lytton Fiction Contest, finding that standard humor detection models perform poorly on it and that LLMs over-use certain literary devices and novel adjective-noun bigrams compared to human writers.
Textual humor is enormously diverse and computational studies need to account for this range, including intentionally bad humor. In this paper, we curate and analyze a novel corpus of sentences from the Bulwer-Lytton Fiction Contest to better understand "bad" humor in English. Standard humor detection models perform poorly on our corpus, and an analysis of literary devices finds that these sentences combine features common in existing humor datasets (e.g., puns, irony) with metaphor, metafiction and simile. LLMs prompted to synthesize contest-style sentences imitate the form but exaggerate the effect by over-using certain literary devices, and including far more novel adjective-noun bigrams than human writers. Data, code and analysis are available at https://github.com/venkatasg/bulwer-lytton