3.9SEApr 18
Exploring Ethical Concerns of Mobile Applications from App Reviews: A Literature SurveyAakash Sorathiya, Gouri Ginde
Privacy, security, and accessibility, like ethical concerns in mobile applications (a.k.a. apps), commonly subsumed under non-functional requirements, are generally reported by users through app reviews available in app stores. However, these remain unidentified among other types of reviews, such as user experiences, problem reports, and new feature discussions. Over the past decade, extensive research has focused on extracting valuable information from app reviews, including feature requests and bug reports. However, there remains a lack of a synthesis of research related to app review analysis for exploring users' ethical concerns. This paper presents a comprehensive survey of this research area, covering 37 relevant studies published since 2012, identified from the initial 553 studies using specific inclusion and exclusion criteria. The studies examined vary in review counts, ranging from 500 to 626 million, and include between a single and 1.3 million apps. Our detailed analysis highlights diverse objectives, methodologies, and strategies, along with additional resources such as app privacy policies, which researchers generally utilize to analyze ethical concerns. Our findings also identify persistent barriers to privacy, security, accessibility, transparency, fairness, accountability, and safety, as reported by users in app reviews. Furthermore, we propose a research agenda that focuses on four key areas, including automated extraction and classification of ethical concerns-related app reviews. Our survey outcomes can assist developers and system architects in recognizing and prioritizing non-functional requirements at the initial stages of the development lifecycle, whereas researchers can expand upon this synthesis to create tools for the automated detection of ethical concerns.
CLNov 11, 2024
Beyond Keywords: A Context-based Hybrid Approach to Mining Ethical Concern-related App ReviewsAakash Sorathiya, Gouri Ginde
With the increasing proliferation of mobile applications in our everyday experiences, the concerns surrounding ethics have surged significantly. Users generally communicate their feedback, report issues, and suggest new functionalities in application (app) reviews, frequently emphasizing safety, privacy, and accountability concerns. Incorporating these reviews is essential to developing successful products. However, app reviews related to ethical concerns generally use domain-specific language and are expressed using a more varied vocabulary. Thus making automated ethical concern-related app review extraction a challenging and time-consuming effort. This study proposes a novel Natural Language Processing (NLP) based approach that combines Natural Language Inference (NLI), which provides a deep comprehension of language nuances, and a decoder-only (LLaMA-like) Large Language Model (LLM) to extract ethical concern-related app reviews at scale. Utilizing 43,647 app reviews from the mental health domain, the proposed methodology 1) Evaluates four NLI models to extract potential privacy reviews and compares the results of domain-specific privacy hypotheses with generic privacy hypotheses; 2) Evaluates four LLMs for classifying app reviews to privacy concerns; and 3) Uses the best NLI and LLM models further to extract new privacy reviews from the dataset. Results show that the DeBERTa-v3-base-mnli-fever-anli NLI model with domain-specific hypotheses yields the best performance, and Llama3.1-8B-Instruct LLM performs best in the classification of app reviews. Then, using NLI+LLM, an additional 1,008 new privacy-related reviews were extracted that were not identified through the keyword-based approach in previous research, thus demonstrating the effectiveness of the proposed approach.
CLJul 29, 2025
Automatic Classification of User Requirements from Online Feedback -- A Replication StudyMeet Bhatt, Nic Boilard, Muhammad Rehan Chaudhary et al.
Natural language processing (NLP) techniques have been widely applied in the requirements engineering (RE) field to support tasks such as classification and ambiguity detection. Although RE research is rooted in empirical investigation, it has paid limited attention to replicating NLP for RE (NLP4RE) studies. The rapidly advancing realm of NLP is creating new opportunities for efficient, machine-assisted workflows, which can bring new perspectives and results to the forefront. Thus, we replicate and extend a previous NLP4RE study (baseline), "Classifying User Requirements from Online Feedback in Small Dataset Environments using Deep Learning", which evaluated different deep learning models for requirement classification from user reviews. We reproduced the original results using publicly released source code, thereby helping to strengthen the external validity of the baseline study. We then extended the setup by evaluating model performance on an external dataset and comparing results to a GPT-4o zero-shot classifier. Furthermore, we prepared the replication study ID-card for the baseline study, important for evaluating replication readiness. Results showed diverse reproducibility levels across different models, with Naive Bayes demonstrating perfect reproducibility. In contrast, BERT and other models showed mixed results. Our findings revealed that baseline deep learning models, BERT and ELMo, exhibited good generalization capabilities on an external dataset, and GPT-4o showed performance comparable to traditional baseline machine learning models. Additionally, our assessment confirmed the baseline study's replication readiness; however missing environment setup files would have further enhanced readiness. We include this missing information in our replication package and provide the replication study ID-card for our study to further encourage and support the replication of our study.