CLCYDec 1, 2024

Text Is Not All You Need: Multimodal Prompting Helps LLMs Understand Humor

arXiv:2412.05315v124 citationsh-index: 1COLING Workshops
Originality Incremental advance
AI Analysis

This addresses the problem of multimodal humor comprehension for AI systems, though it is incremental as it builds on existing TTS methods.

The study tackled the challenge of humor understanding in LLMs by using multimodal prompting with text and spoken forms of jokes, resulting in improved humor explanations across all tested datasets.

While Large Language Models (LLMs) have demonstrated impressive natural language understanding capabilities across various text-based tasks, understanding humor has remained a persistent challenge. Humor is frequently multimodal, relying on phonetic ambiguity, rhythm and timing to convey meaning. In this study, we explore a simple multimodal prompting approach to humor understanding and explanation. We present an LLM with both the text and the spoken form of a joke, generated using an off-the-shelf text-to-speech (TTS) system. Using multimodal cues improves the explanations of humor compared to textual prompts across all tested datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes