CLOct 2, 2017

A Crowd-Annotated Spanish Corpus for Humor Analysis

arXiv:1710.00477v41093 citations
Originality Synthesis-oriented
AI Analysis

This provides a dataset for researchers in computational humor, but it is incremental as it focuses on a specific language and domain.

The authors tackled the problem of lacking human-curated data for computational humor tasks by creating a crowd-annotated Spanish corpus of 27,000 tweets, with an inter-annotator agreement of 0.5710, to support humor detection and analysis.

Computational Humor involves several tasks, such as humor recognition, humor generation, and humor scoring, for which it is useful to have human-curated data. In this work we present a corpus of 27,000 tweets written in Spanish and crowd-annotated by their humor value and funniness score, with about four annotations per tweet, tagged by 1,300 people over the Internet. It is equally divided between tweets coming from humorous and non-humorous accounts. The inter-annotator agreement Krippendorff's alpha value is 0.5710. The dataset is available for general use and can serve as a basis for humor detection and as a first step to tackle subjectivity.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes