UG18 at SemEval-2018 Task 1: Generating Additional Training Data for Predicting Emotion Intensity in Spanish
This work addresses emotion analysis in Spanish social media, but it is incremental as it builds on existing multilingual and semi-supervised techniques.
The study tackled emotion intensity prediction in Spanish tweets by generating additional training data through translation and semi-supervised learning, resulting in models that outperformed regular ones and achieved rankings from second to fifth in subtasks.
The present study describes our submission to SemEval 2018 Task 1: Affect in Tweets. Our Spanish-only approach aimed to demonstrate that it is beneficial to automatically generate additional training data by (i) translating training data from other languages and (ii) applying a semi-supervised learning method. We find strong support for both approaches, with those models outperforming our regular models in all subtasks. However, creating a stepwise ensemble of different models as opposed to simply averaging did not result in an increase in performance. We placed second (EI-Reg), second (EI-Oc), fourth (V-Reg) and fifth (V-Oc) in the four Spanish subtasks we participated in.