CLMay 28, 2018

UG18 at SemEval-2018 Task 1: Generating Additional Training Data for Predicting Emotion Intensity in Spanish

Marloes Kuijper, Mike van Lenthe, Rik van Noord

arXiv:1805.10824v132.01092 citations

Originality Synthesis-oriented

AI Analysis

This work addresses emotion analysis in Spanish social media, but it is incremental as it builds on existing multilingual and semi-supervised techniques.

The study tackled emotion intensity prediction in Spanish tweets by generating additional training data through translation and semi-supervised learning, resulting in models that outperformed regular ones and achieved rankings from second to fifth in subtasks.

The present study describes our submission to SemEval 2018 Task 1: Affect in Tweets. Our Spanish-only approach aimed to demonstrate that it is beneficial to automatically generate additional training data by (i) translating training data from other languages and (ii) applying a semi-supervised learning method. We find strong support for both approaches, with those models outperforming our regular models in all subtasks. However, creating a stepwise ensemble of different models as opposed to simply averaging did not result in an increase in performance. We placed second (EI-Reg), second (EI-Oc), fourth (V-Reg) and fifth (V-Oc) in the four Spanish subtasks we participated in.

View on arXiv PDF

Similar