SEJun 7, 2018Code
A Simple NLP-based Approach to Support Onboarding and Retention in Open Source CommunitiesChristoph Stanik, Lloyd Montgomery, Daniel Martens et al.
Successful open source communities are constantly looking for new members and helping them become active developers. A common approach for developer onboarding in open source projects is to let newcomers focus on relevant yet easy-to-solve issues to familiarize themselves with the code and the community. The goal of this research is twofold. First, we aim at automatically identifying issues that newcomers can resolve by analyzing the history of resolved issues by simply using the title and description of issues. Second, we aim at automatically identifying issues, that can be resolved by newcomers who later become active developers. We mined the issue trackers of three large open source projects and extracted natural language features from the title and description of resolved issues. In a series of experiments, we optimized and compared the accuracy of four supervised classifiers to address our research goals. Random Forest, achieved up to 91% precision (F1-score 72%) towards the first goal while for the second goal, Decision Tree achieved a precision of 92% (F1-score 91%). A qualitative evaluation gave insights on what information in the issue description is helpful for newcomers. Our approach can be used to automatically identify, label, and recommend issues for newcomers in open source software projects based only on the text of the issues.
SEJul 31, 2019
Extracting and Analyzing Context Information in User-Support Conversations on TwitterDaniel Martens, Walid Maalej
While many apps include built-in options to report bugs or request features, users still provide an increasing amount of feedback via social media, like Twitter. Compared to traditional issue trackers, the reporting process in social media is unstructured and the feedback often lacks basic context information, such as the app version or the device concerned when experiencing the issue. To make this feedback actionable to developers, support teams engage in recurring, effortful conversations with app users to clarify missing context items. This paper introduces a simple approach that accurately extracts basic context information from unstructured, informal user feedback on mobile apps, including the platform, device, app version, and system version. Evaluated against a truthset of 3014 tweets from official Twitter support accounts of the 3 popular apps Netflix, Snapchat, and Spotify, our approach achieved precisions from 81% to 99% and recalls from 86% to 98% for the different context item types. Combined with a chatbot that automatically requests missing context items from reporting users, our approach aims at auto-populating issue trackers with structured bug reports.
SEJun 14, 2019
Release early, release often, and watch your users' emotionsDaniel Martens, Walid Maalej
App stores are highly competitive markets, sometimes offering dozens of apps for a single use case. Unexpected app changes such as a feature removal might incite even loyal users to explore alternative apps. Sentiment analysis tools can help monitor users' emotions expressed, e.g., in app reviews or tweets. We found that these emotions include four recurring patterns corresponding to the app releases. Based on these patterns and online reports about popular apps, we derived five release lessons to assist app vendors maintain positive emotions and gain competitive advantages.
IRApr 11, 2019
Towards Understanding and Detecting Fake Reviews in App StoresDaniel Martens, Walid Maalej
App stores include an increasing amount of user feedback in form of app ratings and reviews. Research and recently also tool vendors have proposed analytics and data mining solutions to leverage this feedback to developers and analysts, e.g., for supporting release decisions. Research also showed that positive feedback improves apps' downloads and sales figures and thus their success. As a side effect, a market for fake, incentivized app reviews emerged with yet unclear consequences for developers, app users, and app store operators. This paper studies fake reviews, their providers, characteristics, and how well they can be automatically detected. We conducted disguised questionnaires with 43 fake review providers and studied their review policies to understand their strategies and offers. By comparing 60,000 fake reviews with 62 million reviews from the Apple App Store we found significant differences, e.g., between the corresponding apps, reviewers, rating distribution, and frequency. This inspired the development of a simple classifier to automatically detect fake reviews in app stores. On a labelled and imbalanced dataset including one-tenth of fake reviews, as reported in other domains, our classifier achieved a recall of 91% and an AUC/ROC value of 98%. We discuss our findings and their impact on software engineering, app users, and app store operators.
SEMar 7, 2017
On the Emotion of Users in App ReviewsDaniel Martens, Timo Johann
App store analysis has become an important discipline in recent software engineering research. It empirically studies apps using information mined from their distribution platforms. Information provided by users, such as app reviews, are of high interest to developers. Commercial providers such as App Annie analyzing this information became an important source for companies developing and marketing mobile apps. In this paper, we perform an exploratory study, which analyzes over seven million reviews from the Apple AppStore regarding their emotional sentiment. Since recent research in this field used sentiments to detail and refine their results, we aim to gain deeper insights into the nature of sentiments in user reviews. In this study we try to evaluate whether or not the emotional sentiment can be an informative feature for software engineers, as well as pitfalls of its usage. We present our initial results and discuss how they can be interpreted from the software engineering perspective.