Pun Unintended: LLMs and the Illusion of Humor Understanding
This work addresses the robustness challenges in humor processing for LLMs, which is incremental as it builds on existing pun detection benchmarks.
The paper tackled the problem of LLMs' shallow understanding of puns by showing that subtle changes in puns can mislead them, with results indicating a lack of nuanced grasp compared to humans.
Puns are a form of humorous wordplay that exploits polysemy and phonetic similarity. While LLMs have shown promise in detecting puns, we show in this paper that their understanding often remains shallow, lacking the nuanced grasp typical of human interpretation. By systematically analyzing and reformulating existing pun benchmarks, we demonstrate how subtle changes in puns are sufficient to mislead LLMs. Our contributions include comprehensive and nuanced pun detection benchmarks, human evaluation across recent LLMs, and an analysis of the robustness challenges these models face in processing puns.