Reed at SemEval-2020 Task 9: Fine-Tuning and Bag-of-Words Approaches to Code-Mixed Sentiment Analysis
This addresses sentiment analysis for code-mixed social media data, but it is incremental as it applies standard methods to a specific competition task.
The paper tackled sentiment analysis on Hinglish (code-mixed Hindi-English) tweets in the SemEval-2020 competition, achieving an F-score of 71.3% with their best model, which ranked 4th out of 62 entries.
We explore the task of sentiment analysis on Hinglish (code-mixed Hindi-English) tweets as participants of Task 9 of the SemEval-2020 competition, known as the SentiMix task. We had two main approaches: 1) applying transfer learning by fine-tuning pre-trained BERT models and 2) training feedforward neural networks on bag-of-words representations. During the evaluation phase of the competition, we obtained an F-score of 71.3% with our best model, which placed $4^{th}$ out of 62 entries in the official system rankings.