CL LG APSep 10, 2025

A meta-analysis on the performance of machine-learning based language models for sentiment analysis

Elena Rohde, Jonas Klingwort, Christian Borgs

arXiv:2509.09728v11 citationsh-index: 5

Originality Synthesis-oriented

AI Analysis

This work addresses the need for reliable performance comparisons in sentiment analysis research, but it is incremental as it synthesizes existing studies rather than introducing new methods.

This meta-analysis evaluated machine learning performance for sentiment analysis on Twitter data, finding an average overall accuracy of 0.80 [0.76, 0.84] and highlighting issues with accuracy metrics and reporting practices.

This paper presents a meta-analysis evaluating ML performance in sentiment analysis for Twitter data. The study aims to estimate the average performance, assess heterogeneity between and within studies, and analyze how study characteristics influence model performance. Using PRISMA guidelines, we searched academic databases and selected 195 trials from 20 studies with 12 study features. Overall accuracy, the most reported performance metric, was analyzed using double arcsine transformation and a three-level random effects model. The average overall accuracy of the AIC-optimized model was 0.80 [0.76, 0.84]. This paper provides two key insights: 1) Overall accuracy is widely used but often misleading due to its sensitivity to class imbalance and the number of sentiment classes, highlighting the need for normalization. 2) Standardized reporting of model performance, including reporting confusion matrices for independent test sets, is essential for reliable comparisons of ML classifiers across studies, which seems far from common practice.

View on arXiv PDF

Similar