CLSep 17, 2022

News Headlines Dataset For Sarcasm Detection

arXiv:2212.06035v128 citationsh-index: 8
Originality Synthesis-oriented
AI Analysis

This dataset addresses the need for less noisy and more accessible sarcasm detection resources for NLP researchers, though it is incremental as it builds on existing data collection methods.

The authors tackled the problem of noisy and context-dependent sarcasm detection datasets by curating a new dataset of 28K news headlines, with 13K sarcastic examples, from TheOnion and HuffPost to provide cleaner labels and broader applicability.

Past studies in Sarcasm Detection mostly make use of Twitter datasets collected using hashtag-based supervision but such datasets are noisy in terms of labels and language. Furthermore, many tweets are replies to other tweets, and detecting sarcasm in these requires the availability of contextual tweets. To overcome the limitations related to noise in Twitter datasets, we curate News Headlines Dataset from two news websites: TheOnion aims at producing sarcastic versions of current events, whereas HuffPost publishes real news. The dataset contains about 28K headlines out of which 13K are sarcastic. To make it more useful, we have included the source links of the news articles so that more data can be extracted as needed. In this paper, we describe various details about the dataset and potential use cases apart from Sarcasm Detection.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes