CVAIJul 12, 2024

Predicting Winning Captions for Weekly New Yorker Comics

arXiv:2407.18949v11 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This addresses the problem of automating humorous caption generation for specific cartoons, which is an incremental domain-specific application.

The paper tackled generating witty captions for New Yorker cartoons using Vision Transformers, proposing new baselines for this task.

Image captioning using Vision Transformers (ViTs) represents a pivotal convergence of computer vision and natural language processing, offering the potential to enhance user experiences, improve accessibility, and provide textual representations of visual data. This paper explores the application of image captioning techniques to New Yorker cartoons, aiming to generate captions that emulate the wit and humor of winning entries in the New Yorker Cartoon Caption Contest. This task necessitates sophisticated visual and linguistic processing, along with an understanding of cultural nuances and humor. We propose several new baselines for using vision transformer encoder-decoder models to generate captions for the New Yorker cartoon caption contest.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes