CLAIJan 25, 2024

ChatGPT vs Gemini vs LLaMA on Multilingual Sentiment Analysis

arXiv:2402.01715v145 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of assessing LLM performance in nuanced sentiment analysis for researchers and practitioners, though it is incremental as it applies existing methods to new data.

The study evaluated ChatGPT, Gemini, and LLaMA2 on multilingual sentiment analysis using ambiguous and ironic text across 10 languages, finding that ambiguous scenarios were often handled well by ChatGPT and Gemini but revealed significant biases and inconsistent performance across models and languages.

Automated sentiment analysis using Large Language Model (LLM)-based models like ChatGPT, Gemini or LLaMA2 is becoming widespread, both in academic research and in industrial applications. However, assessment and validation of their performance in case of ambiguous or ironic text is still poor. In this study, we constructed nuanced and ambiguous scenarios, we translated them in 10 languages, and we predicted their associated sentiment using popular LLMs. The results are validated against post-hoc human responses. Ambiguous scenarios are often well-coped by ChatGPT and Gemini, but we recognise significant biases and inconsistent performance across models and evaluated human languages. This work provides a standardised methodology for automated sentiment analysis evaluation and makes a call for action to further improve the algorithms and their underlying data, to improve their performance, interpretability and applicability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes