Beyond Stars: Bridging the Gap Between Ratings and Review Sentiment with LLM
This addresses the limitation of traditional rating systems for app developers and analysts, though it is incremental as it applies existing LLM techniques to a specific domain.
The paper tackled the problem of star ratings failing to capture nuanced feedback in mobile app reviews by proposing an LLM-based framework, which significantly outperformed baseline methods on three datasets in accuracy, robustness, and actionable insights.
We present an advanced approach to mobile app review analysis aimed at addressing limitations inherent in traditional star-rating systems. Star ratings, although intuitive and popular among users, often fail to capture the nuanced feedback present in detailed review texts. Traditional NLP techniques -- such as lexicon-based methods and classical machine learning classifiers -- struggle to interpret contextual nuances, domain-specific terminology, and subtle linguistic features like sarcasm. To overcome these limitations, we propose a modular framework leveraging large language models (LLMs) enhanced by structured prompting techniques. Our method quantifies discrepancies between numerical ratings and textual sentiment, extracts detailed, feature-level insights, and supports interactive exploration of reviews through retrieval-augmented conversational question answering (RAG-QA). Comprehensive experiments conducted on three diverse datasets (AWARE, Google Play, and Spotify) demonstrate that our LLM-driven approach significantly surpasses baseline methods, yielding improved accuracy, robustness, and actionable insights in challenging and context-rich review scenarios.