SIAug 17, 2015
In Quest of Significance: Identifying Types of Twitter Sentiment Events that Predict Spikes in SalesOlga Kolchyna, Th'arsis T. P. Souza, Tomaso Aste et al.
We study the power of Twitter events to predict consumer sales events by analysing sales for 75 companies from the retail sector and over 150 million tweets mentioning those companies along with their sentiment. We suggest an approach for events identification on Twitter extending existing methodologies of event study. We also propose a robust method for clustering Twitter events into different types based on their shape, which captures the varying dynamics of information propagation through the social network. We provide empirical evidence that through events differentiation based on their shape we can clearly identify types of Twitter events that have a more significant power to predict spikes in sales than the aggregated Twitter signal.
CLJul 3, 2015
Twitter Sentiment Analysis: Lexicon Method, Machine Learning Method and Their CombinationOlga Kolchyna, Tharsis T. P. Souza, Philip Treleaven et al.
This paper covers the two approaches for sentiment analysis: i) lexicon based method; ii) machine learning method. We describe several techniques to implement these approaches and discuss how they can be adopted for sentiment classification of Twitter messages. We present a comparative study of different lexicon combinations and show that enhancing sentiment lexicons with emoticons, abbreviations and social-media slang expressions increases the accuracy of lexicon-based classification for Twitter. We discuss the importance of feature generation and feature selection processes for machine learning sentiment classification. To quantify the performance of the main sentiment analysis methods over Twitter we run these algorithms on a benchmark Twitter dataset from the SemEval-2013 competition, task 2-B. The results show that machine learning method based on SVM and Naive Bayes classifiers outperforms the lexicon method. We present a new ensemble method that uses a lexicon based sentiment score as input feature for the machine learning approach. The combined method proved to produce more precise classifications. We also show that employing a cost-sensitive classifier for highly unbalanced datasets yields an improvement of sentiment classification performance up to 7%.