JU_KS@SAIL_CodeMixed-2017: Sentiment Analysis for Indian Code Mixed Social Media Texts
This work addresses sentiment analysis for Indian code-mixed social media data, but it is incremental as it applies existing methods to a specific dataset without major innovations.
The paper tackled sentiment analysis for Hindi-English and Bengali-English code-mixed social media texts using Multinomial Naïve Bayes with n-gram and SentiWordNet features, achieving 3rd place in the contest with performance close to the best system.
This paper reports about our work in the NLP Tool Contest @ICON-2017, shared task on Sentiment Analysis for Indian Languages (SAIL) (code mixed). To implement our system, we have used a machine learning algo-rithm called Multinomial Naïve Bayes trained using n-gram and SentiWordnet features. We have also used a small SentiWordnet for English and a small SentiWordnet for Bengali. But we have not used any SentiWordnet for Hindi language. We have tested our system on Hindi-English and Bengali-English code mixed social media data sets released for the contest. The performance of our system is very close to the best system participated in the contest. For both Bengali-English and Hindi-English runs, our system was ranked at the 3rd position out of all submitted runs and awarded the 3rd prize in the contest.