Text Is Not All You Need: Multimodal Prompting Helps LLMs Understand Humor
This addresses the problem of multimodal humor comprehension for AI systems, though it is incremental as it builds on existing TTS methods.
The study tackled the challenge of humor understanding in LLMs by using multimodal prompting with text and spoken forms of jokes, resulting in improved humor explanations across all tested datasets.
While Large Language Models (LLMs) have demonstrated impressive natural language understanding capabilities across various text-based tasks, understanding humor has remained a persistent challenge. Humor is frequently multimodal, relying on phonetic ambiguity, rhythm and timing to convey meaning. In this study, we explore a simple multimodal prompting approach to humor understanding and explanation. We present an LLM with both the text and the spoken form of a joke, generated using an off-the-shelf text-to-speech (TTS) system. Using multimodal cues improves the explanations of humor compared to textual prompts across all tested datasets.