SESep 7, 2024

Revisiting Sentiment Analysis for Software Engineering in the Era of Large Language Models

Ting Zhang, Ivana Clairine Irsan, Ferdian Thung, David Lo

arXiv:2310.1111347 citationsh-index: 36Has Code

Originality Synthesis-oriented

AI Analysis

For software engineering practitioners, this work provides guidance on when to use bLLMs versus fine-tuned sLLMs for sentiment analysis, but the findings are incremental as they confirm known trade-offs.

The study investigates whether large language models (bLLMs) can address the labeled data shortage in sentiment analysis for software engineering, finding that bLLMs achieve state-of-the-art performance on datasets with limited or imbalanced data, while fine-tuned smaller models still excel with ample balanced data.

Software development involves collaborative interactions where stakeholders express opinions across various platforms. Recognizing the sentiments conveyed in these interactions is crucial for the effective development and ongoing maintenance of software systems. For software products, analyzing the sentiment of user feedback, e.g., reviews, comments, and forum posts can provide valuable insights into user satisfaction and areas for improvement. This can guide the development of future updates and features. However, accurately identifying sentiments in software engineering datasets remains challenging. This study investigates bigger large language models (bLLMs) in addressing the labeled data shortage that hampers fine-tuned smaller large language models (sLLMs) in software engineering tasks. We conduct a comprehensive empirical study using five established datasets to assess three open-source bLLMs in zero-shot and few-shot scenarios. Additionally, we compare them with fine-tuned sLLMs, using sLLMs to learn contextual embeddings of text from software platforms. Our experimental findings demonstrate that bLLMs exhibit state-of-the-art performance on datasets marked by limited training data and imbalanced distributions. bLLMs can also achieve excellent performance under a zero-shot setting. However, when ample training data is available or the dataset exhibits a more balanced distribution, fine-tuned sLLMs can still achieve superior results.

View on arXiv PDF

Similar