CLSep 19, 2024

Lexicon-Based Sentiment Analysis on Text Polarities with Evaluation of Classification Models

arXiv:2409.12840v17 citationsh-index: 3

AI Analysis

This is an incremental study applying existing methods to a specific domain (Twitter sentiment analysis) with standard evaluation.

This paper tackles sentiment analysis on Twitter data using lexicon-based methods and evaluates multiple classification models, achieving 81% accuracy with Random Forest on a dataset of 1.6 million tweets.

Sentiment analysis possesses the potential of diverse applicability on digital platforms. Sentiment analysis extracts the polarity to understand the intensity and subjectivity in the text. This work uses a lexicon-based method to perform sentiment analysis and shows an evaluation of classification models trained over textual data. The lexicon-based methods identify the intensity of emotion and subjectivity at word levels. The categorization identifies the informative words inside a text and specifies the quantitative ranking of the polarity of words. This work is based on a multi-class problem of text being labeled as positive, negative, or neutral. Twitter sentiment dataset containing 1.6 million unprocessed tweets is used with lexicon-based methods like Text Blob and Vader Sentiment to introduce the neutrality measure on text. The analysis of lexicons shows how the word count and the intensity classify the text. A comparative analysis of machine learning models, Naiive Bayes, Support Vector Machines, Multinomial Logistic Regression, Random Forest, and Extreme Gradient (XG) Boost performed across multiple performance metrics. The best estimations are achieved through Random Forest with an accuracy score of 81%. Additionally, sentiment analysis is applied for a personality judgment case against a Twitter profile based on online activity.

View on arXiv PDF

Similar