CLJul 6, 2019

Applying a Pre-trained Language Model to Spanish Twitter Humor Prediction

arXiv:1907.03187v17 citations
Originality Synthesis-oriented
AI Analysis

This work addresses humor detection in Spanish social media, an incremental improvement in a specific domain.

The authors tackled humor prediction in Spanish tweets by training a language model from scratch on a large Twitter corpus and applying label smoothing to address label noise, achieving 3rd place in classification and 2nd place in regression in the HAHA 2019 Challenge.

Our entry into the HAHA 2019 Challenge placed $3^{rd}$ in the classification task and $2^{nd}$ in the regression task. We describe our system and innovations, as well as comparing our results to a Naive Bayes baseline. A large Twitter based corpus allowed us to train a language model from scratch focused on Spanish and transfer that knowledge to our competition model. To overcome the inherent errors in some labels we reduce our class confidence with label smoothing in the loss function. All the code for our project is included in a GitHub repository for easy reference and to enable replication by others.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes