IRMar 18, 2019
Sentiment Analysis on IMDB Movie Comments and Twitter Data by Machine Learning and Vector Space Techniquesİlhan Tarımer, Adil Çoban, Arif Emre Kocaman
This study's goal is to create a model of sentiment analysis on a 2000 rows IMDB movie comments and 3200 Twitter data by using machine learning and vector space techniques; positive or negative preliminary information about the text is to provide. In the study, a vector space was created in the KNIME Analytics platform, and a classification study was performed on this vector space by Decision Trees, Naïve Bayes and Support Vector Machines classification algorithms. The conclusions obtained were compared in terms of each algorithms. The classification results for IMDB movie comments are obtained as 94,00%, 73,20%, and 85,50% by Decision Tree, Naive Bayes and SVM algorithms. The classification results for Twitter data set are presented as 82,76%, 75,44% and 72,50% by Decision Tree, Naive Bayes SVM algorithms as well. It is seen that the best classification results presented in both data sets are which calculated by SVM algorithm.
LGJul 12, 2018
Feature Selection for Gender Classification in TUIK Life Satisfaction SurveyAdil Çoban, Ilhan Tarımer
As known, attribute selection is a method that is used before the classification of data mining. In this study, a new data set has been created by using attributes expressing overall satisfaction in Turkey Statistical Institute (TSI) Life Satisfaction Survey dataset. Attributes are sorted by Ranking search method using attribute selection algorithms in a data mining application. These selected attributes were subjected to a classification test with Naive Bayes and Random Forest from machine learning algorithms. The feature selection algorithms are compared according to the number of attributes selected and the classification accuracy rates achievable with them. In this study, which is aimed at reducing the dataset volume, the best classification result comes up with 3 attributes selected by the Chi2 algorithm. The best classification rate was 73% with the Random Forest classification algorithm.