Sentiment Analysis on Bangla and Romanized Bangla Text (BRBT) using Deep Recurrent models
This addresses the problem of limited resources for Bangla sentiment analysis, enabling more comparable and reusable research, though it is incremental as it applies existing methods to new data.
The authors tackled the lack of a standard dataset for sentiment analysis in Bangla and Romanized Bangla by creating a substantial, validated dataset and tested it using deep recurrent models like LSTM with binary and categorical crossentropy loss functions, achieving promising results.
Sentiment Analysis (SA) is an action research area in the digital age. With rapid and constant growth of online social media sites and services, and the increasing amount of textual data such as - statuses, comments, reviews etc. available in them, application of automatic SA is on the rise. However, most of the research works on SA in natural language processing (NLP) are based on English language. Despite being the sixth most widely spoken language in the world, Bangla still does not have a large and standard dataset. Because of this, recent research works in Bangla have failed to produce results that can be both comparable to works done by others and reusable as stepping stones for future researchers to progress in this field. Therefore, we first tried to provide a textual dataset - that includes not just Bangla, but Romanized Bangla texts as well, is substantial, post-processed and multiple validated, ready to be used in SA experiments. We tested this dataset in Deep Recurrent model, specifically, Long Short Term Memory (LSTM), using two types of loss functions - binary crossentropy and categorical crossentropy, and also did some experimental pre-training by using data from one validation to pre-train the other and vice versa. Lastly, we documented the results along with some analysis on them, which were promising.