A Comparative Analysis of Noise Reduction Methods in Sentiment Analysis on Noisy Bangla Texts
This addresses a specific problem for researchers working with low-resource languages like Bangla, but it is incremental as it builds on existing sentiment analysis with new data and baseline methods.
The paper tackles sentiment analysis on noisy Bangla texts by introducing a manually annotated dataset (NC-SentNoB) with 10 noise types from 15K texts and baseline noise reduction methods, but finds these methods unsatisfactory with no concrete performance numbers reported.
While Bangla is considered a language with limited resources, sentiment analysis has been a subject of extensive research in the literature. Nevertheless, there is a scarcity of exploration into sentiment analysis specifically in the realm of noisy Bangla texts. In this paper, we introduce a dataset (NC-SentNoB) that we annotated manually to identify ten different types of noise found in a pre-existing sentiment analysis dataset comprising of around 15K noisy Bangla texts. At first, given an input noisy text, we identify the noise type, addressing this as a multi-label classification task. Then, we introduce baseline noise reduction methods to alleviate noise prior to conducting sentiment analysis. Finally, we assess the performance of fine-tuned sentiment analysis models with both noisy and noise-reduced texts to make comparisons. The experimental findings indicate that the noise reduction methods utilized are not satisfactory, highlighting the need for more suitable noise reduction methods in future research endeavors. We have made the implementation and dataset presented in this paper publicly available at https://github.com/ktoufiquee/A-Comparative-Analysis-of-Noise-Reduction-Methods-in-Sentiment-Analysis-on-Noisy-Bangla-Texts