CL AIDec 18, 2023

From Generalized Laughter to Personalized Chuckles: Unleashing the Power of Data Fusion in Subjective Humor Detection

Julita Bielaniewicz, Przemysław Kazienko

arXiv:2312.11296v10.5h-index: 72023 IEEE International Conference on Data Mining Workshops (ICDMW)

Originality Incremental advance

AI Analysis

This addresses the challenge of subjectivity in NLP for humor detection, though it is incremental as it builds on existing personalized and generalized approaches.

The paper tackles humor detection in NLP by incorporating personalized data into training, showing that combining all personalized datasets with personalized models boosts macro F1 scores by up to 35% across five test sets.

The vast area of subjectivity in Natural Language Processing (NLP) poses a challenge to the solutions typically used in generalized tasks. As exploration in the scope of generalized NLP is much more advanced, it implies the tremendous gap that is still to be addressed amongst all feasible tasks where an opinion, taste, or feelings are inherent, thus creating a need for a solution, where a data fusion could take place. We have chosen the task of funniness, as it heavily relies on the sense of humor, which is fundamentally subjective. Our experiments across five personalized and four generalized datasets involving several personalized deep neural architectures have shown that the task of humor detection greatly benefits from the inclusion of personalized data in the training process. We tested five scenarios of training data fusion that focused on either generalized (majority voting) or personalized approaches to humor detection. The best results were obtained for the setup, in which all available personalized datasets were joined to train the personalized reasoning model. It boosted the prediction performance by up to approximately 35% of the macro F1 score. Such a significant gain was observed for all five personalized test sets. At the same time, the impact of the model's architecture was much less than the personalization itself. It seems that concatenating personalized datasets, even with the cost of normalizing the range of annotations across all datasets, if combined with the personalized models, results in an enormous increase in the performance of humor detection.

View on arXiv PDF

Similar