Humor Detection: A Transformer Gets the Last Laugh
This addresses humor identification for natural language processing applications, but is incremental as it extends existing humor detection tasks with a new dataset and model.
The paper tackles humor detection in text by proposing a new task of assessing joke humor using a Transformer model trained on 16,000 Reddit ratings. It achieves results comparable to human performance and outperforms previous work with F-measures of 93.1% on puns and 98.6% on short jokes.
Much previous work has been done in attempting to identify humor in text. In this paper we extend that capability by proposing a new task: assessing whether or not a joke is humorous. We present a novel way of approaching this problem by building a model that learns to identify humorous jokes based on ratings gleaned from Reddit pages, consisting of almost 16,000 labeled instances. Using these ratings to determine the level of humor, we then employ a Transformer architecture for its advantages in learning from sentence context. We demonstrate the effectiveness of this approach and show results that are comparable to human performance. We further demonstrate our model's increased capabilities on humor identification problems, such as the previously created datasets for short jokes and puns. These experiments show that this method outperforms all previous work done on these tasks, with an F-measure of 93.1% for the Puns dataset and 98.6% on the Short Jokes dataset.