CLJul 6, 2019

Applying a Pre-trained Language Model to Spanish Twitter Humor Prediction

Bobak Farzin, Piotr Czapla, Jeremy Howard

arXiv:1907.03187v10.37 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses humor detection in Spanish social media, an incremental improvement in a specific domain.

The authors tackled humor prediction in Spanish tweets by training a language model from scratch on a large Twitter corpus and applying label smoothing to address label noise, achieving 3rd place in classification and 2nd place in regression in the HAHA 2019 Challenge.

Our entry into the HAHA 2019 Challenge placed $3^{rd}$ in the classification task and $2^{nd}$ in the regression task. We describe our system and innovations, as well as comparing our results to a Naive Bayes baseline. A large Twitter based corpus allowed us to train a language model from scratch focused on Spanish and transfer that knowledge to our competition model. To overcome the inherent errors in some labels we reduce our class confidence with label smoothing in the loss function. All the code for our project is included in a GitHub repository for easy reference and to enable replication by others.

View on arXiv PDF Code

Similar