Punny Captions: Witty Wordplay in Image Descriptions
This addresses the challenge of creating engaging and humorous content for image captioning, though it is incremental as it builds on existing captioning methods.
The paper tackled the problem of generating witty image descriptions using puns, developing retrieval and neural network approaches that showed substantial improvements over baselines in human studies, with the model's descriptions sometimes rated slightly wittier than human-written ones under similar constraints.
Wit is a form of rich interaction that is often grounded in a specific situation (e.g., a comment in response to an event). In this work, we attempt to build computational models that can produce witty descriptions for a given image. Inspired by a cognitive account of humor appreciation, we employ linguistic wordplay, specifically puns, in image descriptions. We develop two approaches which involve retrieving witty descriptions for a given image from a large corpus of sentences, or generating them via an encoder-decoder neural network architecture. We compare our approach against meaningful baseline approaches via human studies and show substantial improvements. We find that when a human is subject to similar constraints as the model regarding word usage and style, people vote the image descriptions generated by our model to be slightly wittier than human-written witty descriptions. Unsurprisingly, humans are almost always wittier than the model when they are free to choose the vocabulary, style, etc.