CLIRLGSINov 3, 2019

Sentiment analysis model for Twitter data in Polish language

arXiv:1911.00985v11 citations
Originality Synthesis-oriented
AI Analysis

This is an incremental application of existing methods to a new dataset (Polish Twitter data), addressing sentiment analysis for social media monitoring in a specific language context.

The paper tackled sentiment analysis of Polish tweets from the 2015 presidential election by calculating sentiment scores based on emoticons and words, and it found that Naive Bayes and Maximum Entropy classifiers achieved accuracies of 71.76% and 77.32%, respectively.

Text mining analysis of tweets gathered during Polish presidential election on May 10th, 2015. The project included implementation of engine to retrieve information from Twitter, building document corpora, corpora cleaning, and creating Term-Document Matrix. Each tweet from the text corpora was assigned a category based on its sentiment score. The score was calculated using the number of positive and/or negative emoticons and Polish words in each document. The result data set was used to train and test four machine learning classifiers, to select these providing most accurate automatic tweet classification results. The Naive Bayes and Maximum Entropy algorithms achieved the best accuracy of respectively 71.76% and 77.32%. All implementation tasks were completed using R programming language.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes