Decision Tree J48 at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text (Hinglish)
This is an incremental approach to a domain-specific problem in natural language processing for social media analysis.
The paper tackled sentiment analysis for code-mixed Hindi-English social media text using the J48 decision tree classifier, achieving F1 scores of 0.4972 and 0.5316 on test data.
This paper discusses the design of the system used for providing a solution for the problem given at SemEval-2020 Task 9 where sentiment analysis of code-mixed language Hindi and English needed to be performed. This system uses Weka as a tool for providing the classifier for the classification of tweets and python is used for loading the data from the files provided and cleaning it. Only part of the training data was provided to the system for classifying the tweets in the test data set on which evaluation of the system was done. The system performance was assessed using the official competition evaluation metric F1-score. Classifier was trained on two sets of training data which resulted in F1 scores of 0.4972 and 0.5316.