CLMay 6

A Comparative Analysis of Machine Learning and Deep Learning Models for Tweet Sentiment Classification: A Case Study on the Sentiment140 Dataset

arXiv:2605.0488814.2
AI Analysis

For practitioners working with medium-scale social media sentiment analysis, this paper provides a benchmark comparison showing that simpler models can be more effective than complex deep learning architectures.

This study compares Logistic Regression with TF-IDF against a BiLSTM deep learning model on a 10,000-tweet subset of Sentiment140, finding that Logistic Regression achieves higher accuracy (73.5% vs. 69.17%) and suggesting that classical ML can outperform deep learning on medium-scale informal text.

The exponential growth of social media has created an urgent need for automated systems to analyze unstructured public sentiment in real time. This study compares a traditional Logistic Regression model using TF-IDF features with a deep learning Bidirectional Long Short-Term Memory (BiLSTM) architecture on a 10,000-tweet subset of the Sentiment140 dataset. Experimental results show that Logistic Regression outperformed BiLSTM, achieving an accuracy of 73.5% compared with 69.17%, while the deep learning model exhibited mild overfitting. These findings suggest that for medium-scale informal text data, classical machine learning with robust feature extraction can outperform more complex deep learning approaches. Finally, the trained models were integrated into an interactive web application using Streamlit and deployed on Hugging Face Spaces for public access.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes