HinglishNLP: Fine-tuned Language Models for Hinglish Sentiment Detection
This work addresses sentiment analysis for under-explored code-mixed social media text, though it appears incremental as it primarily benchmarks existing methods without introducing new techniques.
The paper tackled sentiment detection in Hinglish (Hindi-English code-mixed) social media text by benchmarking fine-tuned transformer models against classical machine learning methods, finding that an NB-SVM model outperformed RoBERTa by 6.2% relative F1 and a majority-vote ensemble achieved the best F1 of 0.707.
Sentiment analysis for code-mixed social media text continues to be an under-explored area. This work adds two common approaches: fine-tuning large transformer models and sample efficient methods like ULMFiT. Prior work demonstrates the efficacy of classical ML methods for polarity detection. Fine-tuned general-purpose language representation models, such as those of the BERT family are benchmarked along with classical machine learning and ensemble methods. We show that NB-SVM beats RoBERTa by 6.2% (relative) F1. The best performing model is a majority-vote ensemble which achieves an F1 of 0.707. The leaderboard submission was made under the codalab username nirantk, with F1 of 0.689.