Decoders Laugh as Loud as Encoders
This addresses the open question of humor understanding in AI for NLP researchers, but it is incremental as it compares existing models on a specific task.
The paper tackled the problem of whether large language models understand humor by comparing a fine-tuned decoder (GPT-4o) and encoder (RoBERTa) on humor detection, finding that GPT-4o achieved a Mean F1-macro score of 0.85, nearly matching RoBERTa's 0.86.
From the dawn of the computer, Allen Turing dreamed of a robot that could communicate using language as a human being. The recent advances in the field of Large Language Models (LLMs) shocked the scientific community when a single model can apply for various natural language processing (NLP) tasks, while the output results are sometimes even better than most human communication skills. Models such as GPT, Claude, Grok, etc. have left their mark on the scientific community. However, it is unclear how much these models understand what they produce, especially in a nuanced theme such as humor. The question of whether computers understand humor is still open (among the decoders, the latest to be checked was GPT-2). We addressed this issue in this paper; we have showed that a fine-tuned decoder (GPT-4o) performed (Mean F1-macro score of 0.85) as well as the best fine-tuned encoder (RoBERTa with a Mean of F1-score 0.86)