CLApr 9, 2024

Generalizable Sarcasm Detection Is Just Around The Corner, Of Course!

arXiv:2404.06357v233 citationsh-index: 8NAACL
AI Analysis

This work highlights the challenge of generalizing sarcasm detection across different domains and styles, which is incremental for NLP researchers focusing on robust language understanding.

The study tested sarcasm detection models on datasets with varying sarcasm characteristics and found that models fine-tuned on third-party labels performed better intra-dataset but failed to generalize across datasets, with a new dataset showing the highest generalizability.

We tested the robustness of sarcasm detection models by examining their behavior when fine-tuned on four sarcasm datasets containing varying characteristics of sarcasm: label source (authors vs. third-party), domain (social media/online vs. offline conversations/dialogues), style (aggressive vs. humorous mocking). We tested their prediction performance on the same dataset (intra-dataset) and across different datasets (cross-dataset). For intra-dataset predictions, models consistently performed better when fine-tuned with third-party labels rather than with author labels. For cross-dataset predictions, most models failed to generalize well to the other datasets, implying that one type of dataset cannot represent all sorts of sarcasm with different styles and domains. Compared to the existing datasets, models fine-tuned on the new dataset we release in this work showed the highest generalizability to other datasets. With a manual inspection of the datasets and post-hoc analysis, we attributed the difficulty in generalization to the fact that sarcasm actually comes in different domains and styles. We argue that future sarcasm research should take the broad scope of sarcasm into account.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes