LT3 at SemEval-2020 Task 9: Cross-lingual Embeddings for Sentiment Analysis of Hinglish Social Media Text
This addresses sentiment analysis for users of Hinglish social media, but it is incremental as it builds on existing embedding methods.
The paper tackled sentiment analysis for Hinglish social media text by comparing cross-lingual embeddings and retrained English embeddings, achieving a best F1-score of 70.52% with the retrained approach.
This paper describes our contribution to the SemEval-2020 Task 9 on Sentiment Analysis for Code-mixed Social Media Text. We investigated two approaches to solve the task of Hinglish sentiment analysis. The first approach uses cross-lingual embeddings resulting from projecting Hinglish and pre-trained English FastText word embeddings in the same space. The second approach incorporates pre-trained English embeddings that are incrementally retrained with a set of Hinglish tweets. The results show that the second approach performs best, with an F1-score of 70.52% on the held-out test data.