CLOct 13, 2023

BanglaNLP at BLP-2023 Task 2: Benchmarking different Transformer Models for Sentiment Analysis of Bangla Social Media Posts

arXiv:2310.09238v221.0131 citationsh-index: 2

Originality Synthesis-oriented

AI Analysis

This work addresses sentiment analysis for Bangla, a low-resource language, but is incremental as it applies existing methods to a specific dataset.

The paper tackled sentiment analysis of Bangla social media posts by benchmarking Transformer models, achieving a micro-F1 score of 67.02% and ranking 21st in a shared task.

Bangla is the 7th most widely spoken language globally, with a staggering 234 million native speakers primarily hailing from India and Bangladesh. This morphologically rich language boasts a rich literary tradition, encompassing diverse dialects and language-specific challenges. Despite its linguistic richness and history, Bangla remains categorized as a low-resource language within the natural language processing (NLP) and speech community. This paper presents our submission to Task 2 (Sentiment Analysis of Bangla Social Media Posts) of the BLP Workshop. We experiment with various Transformer-based architectures to solve this task. Our quantitative results show that transfer learning really helps in better learning of the models in this low-resource language scenario. This becomes evident when we further finetune a model which has already been finetuned on twitter data for sentiment analysis task and that finetuned model performs the best among all other models. We also perform a detailed error analysis where we find some instances where ground truth labels need to be relooked at. We obtain a micro-F1 of 67.02\% on the test set and our performance in this shared task is ranked at 21 in the leaderboard.

View on arXiv PDF

Similar