CLAug 9, 2018

Code-Mixed Sentiment Analysis Using Machine Learning and Neural Network Approaches

Pruthwik Mishra, Prathyusha Danda, Pranav Dhakras

arXiv:1808.03299v10.527 citations

Originality Synthesis-oriented

AI Analysis

This work addresses sentiment analysis for code-mixed languages, which is an incremental improvement in a domain-specific task.

The paper tackled sentiment analysis for code-mixed Indian languages (HI-EN and BN-EN) by submitting four models, including an ensemble voting classifier and a linear SVM using TF-IDF features, and won first place in the SAIL contest with F-scores of 0.569 for HI-EN and 0.526 for BN-EN.

Sentiment Analysis for Indian Languages (SAIL)-Code Mixed tools contest aimed at identifying the sentence level sentiment polarity of the code-mixed dataset of Indian languages pairs (Hi-En, Ben-Hi-En). Hi-En dataset is henceforth referred to as HI-EN and Ben-Hi-En dataset as BN-EN respectively. For this, we submitted four models for sentiment analysis of code-mixed HI-EN and BN-EN datasets. The first model was an ensemble voting classifier consisting of three classifiers - linear SVM, logistic regression and random forests while the second one was a linear SVM. Both the models used TF-IDF feature vectors of character n-grams where n ranged from 2 to 6. We used scikit-learn (sklearn) machine learning library for implementing both the approaches. Run1 was obtained from the voting classifier and Run2 used the linear SVM model for producing the results. Out of the four submitted outputs Run2 outperformed Run1 in both the datasets. We finished first in the contest for both HI-EN with an F-score of 0.569 and BN-EN with an F-score of 0.526.

View on arXiv PDF

Similar