Walid Maalej

h-index12

28papers

1,246citations

Novelty28%

AI Score40

Ranked #96,402 of 201,326 authors (top 48%)#1,190 in SE (top 35%)

28 Papers

AIFeb 21, 2023

Tailoring Requirements Engineering for Responsible AI

Walid Maalej, Yen Dieu Pham, Larissa Chazette

Requirements Engineering (RE) is the discipline for identifying, analyzing, as well as ensuring the implementation and delivery of user, technical, and societal requirements. Recently reported issues concerning the acceptance of Artificial Intelligence (AI) solutions after deployment, e.g. in the medical, automotive, or scientific domains, stress the importance of RE for designing and delivering Responsible AI systems. In this paper, we argue that RE should not only be carefully conducted but also tailored for Responsible AI. We outline related challenges for research and practice.

LGApr 4, 2022

Efficient, Uncertainty-based Moderation of Neural Networks Text Classifiers

Jakob Smedegaard Andersen, Walid Maalej

To maximize the accuracy and increase the overall acceptance of text classifiers, we propose a framework for the efficient, in-operation moderation of classifiers' output. Our framework focuses on use cases in which F1-scores of modern Neural Networks classifiers (ca.~90%) are still inapplicable in practice. We suggest a semi-automated approach that uses prediction uncertainties to pass unconfident, probably incorrect classifications to human moderators. To minimize the workload, we limit the human moderated data to the point where the accuracy gains saturate and further human effort does not lead to substantial improvements. A series of benchmarking experiments based on three different datasets and three state-of-the-art classifiers show that our framework can improve the classification F1-scores by 5.1 to 11.2% (up to approx.~98 to 99%), while reducing the moderation load up to 73.3% compared to a random moderation.

SEApr 27, 2022

Beyond Duplicates: Towards Understanding and Predicting Link Types in Issue Tracking Systems

Clara Marie Lüders, Abir Bouraffa, Walid Maalej

Software projects use Issue Tracking Systems (ITS) like JIRA to track issues and organize the workflows around them. Issues are often inter-connected via different links such as the default JIRA link types Duplicate, Relate, Block, or Subtask. While previous research has mostly focused on analyzing and predicting duplication links, this work aims at understanding the various other link types, their prevalence, and characteristics towards a more reliable link type prediction. For this, we studied 607,208 links connecting 698,790 issues in 15 public JIRA repositories. Besides the default types, the custom types Depend, Incorporate, Split, and Cause were also common. We manually grouped all 75 link types used in the repositories into five general categories: General Relation, Duplication, Composition, Temporal / Causal, and Workflow. Comparing the structures of the corresponding graphs, we observed several trends. For instance, Duplication links tend to represent simpler issue graphs often with two components and Composition links present the highest amount of hierarchical tree structures (97.7%). Surprisingly, General Relation links have a significantly higher transitivity score than Duplication and Temporal / Causal links. Motivated by the differences between the link types and by their popularity, we evaluated the robustness of two state-of-the-art duplicate detection approaches from the literature on the JIRA dataset. We found that current deep-learning approaches confuse between Duplication and other links in almost all repositories. On average, the classification accuracy dropped by 6% for one approach and 12% for the other. Extending the training sets with other link types seems to partly solve this issue. We discuss our findings and their implications for research and practice.

SEAug 30, 2024

Getting Inspiration for Feature Elicitation: App Store- vs. LLM-based Approach

Jialiang Wei, Anne-Lise Courbis, Thomas Lambolais et al.

Over the past decade, app store (AppStore)-inspired requirements elicitation has proven to be highly beneficial. Developers often explore competitors' apps to gather inspiration for new features. With the advance of Generative AI, recent studies have demonstrated the potential of large language model (LLM)-inspired requirements elicitation. LLMs can assist in this process by providing inspiration for new feature ideas. While both approaches are gaining popularity in practice, there is a lack of insight into their differences. We report on a comparative study between AppStore- and LLM-based approaches for refining features into sub-features. By manually analyzing 1,200 sub-features recommended from both approaches, we identified their benefits, challenges, and key differences. While both approaches recommend highly relevant sub-features with clear descriptions, LLMs seem more powerful particularly concerning novel unseen app scopes. Moreover, some recommended features are imaginary with unclear feasibility, which suggests the importance of a human-analyst in the elicitation loop.

AIAug 1, 2024

Can Developers Prompt? A Controlled Experiment for Code Documentation Generation

Hans-Alexander Kruse, Tim Puhlfürß, Walid Maalej

Large language models (LLMs) bear great potential for automating tedious development tasks such as creating and maintaining code documentation. However, it is unclear to what extent developers can effectively prompt LLMs to create concise and useful documentation. We report on a controlled experiment with 20 professionals and 30 computer science students tasked with code documentation generation for two Python functions. The experimental group freely entered ad-hoc prompts in a ChatGPT-like extension of Visual Studio Code, while the control group executed a predefined few-shot prompt. Our results reveal that professionals and students were unaware of or unable to apply prompt engineering techniques. Especially students perceived the documentation produced from ad-hoc prompts as significantly less readable, less concise, and less helpful than documentation from prepared prompts. Some professionals produced higher quality documentation by just including the keyword Docstring in their ad-hoc prompts. While students desired more support in formulating prompts, professionals appreciated the flexibility of ad-hoc prompting. Participants in both groups rarely assessed the output as perfect. Instead, they understood the tools as support to iteratively refine the documentation. Further research is needed to understand which prompting skills and preferences developers have and which support they need for certain tasks.

SEMar 4

LikeThis! Empowering App Users to Submit UI Improvement Suggestions Instead of Complaints

Jialiang Wei, Ali Ebrahimi Pourasad, Walid Maalej

User feedback is crucial for the evolution of mobile apps. However, research suggests that users tend to submit uninformative, vague, or destructive feedback. Unlike recent AI4SE approaches that focus on generating code and other development artifacts, our work aims at empowering users to submit better and more constructive UI feedback with concrete suggestions on how to improve the app. We propose LikeThis!, a GenAI-based approach that takes a user comment with the corresponding screenshot to immediately generate multiple improvement alternatives, from which the user can easily choose their preferred option. To evaluate LikeThis!, we first conducted a model benchmarking study based on a public dataset of carefully critiqued UI designs. The results show that GPT-Image-1 significantly outperformed three other state-of-the-art image generation models in improving the designs to address UI issues while keeping the fidelity and without introducing new issues. An intermediate step in LikeThis! is to generate a solution specification before sketching the design as a key to achieving effective improvement. Second, we conducted a user study with 10 production apps, where 15 users used LikeThis! to submit their feedback on encountered issues. Later, the developers of the apps assessed the understandability and actionability of the feedback with and without generated improvements. The results show that our approach helps generate better feedback from both user and developer perspectives, paving the way for AI-assisted user-developer collaboration.

SEMar 4

FeedAIde: Guiding App Users to Submit Rich Feedback Reports by Asking Context-Aware Follow-Up Questions

Ali Ebrahimi Pourasad, Meyssam Saghiri, Walid Maalej

User feedback is essential for the success of mobile apps, yet what users report and what developers need often diverge. Research shows that users often submit vague feedback and omit essential contextual details. This leads to incomplete reports and time-consuming clarification discussions. To overcome this challenge, we propose FeedAIde, a context-aware, interactive feedback approach that supports users during the reporting process by leveraging the reasoning capabilities of Multimodal Large Language Models. FeedAIde captures contextual information, such as the screenshot where the issue emerges, and uses it for adaptive follow-up questions to collaboratively refine with the user a rich feedback report that contains information relevant to developers. We implemented an iOS framework of FeedAIde and evaluated it on a gym's app with its users. Compared to the app's simple feedback form, participants rated FeedAIde as easier and more helpful for reporting feedback. An assessment by two industry experts of the resulting 54 reports showed that FeedAIde improved the quality of both bug reports and feature requests, particularly in terms of completeness. The findings of our study demonstrate the potential of context-aware, GenAI-powered feedback reporting to enhance the experience for users and increase the information value for developers.

SEFeb 14, 2021Code

Automatically Matching Bug Reports With Related App Reviews

Marlo Häring, Christoph Stanik, Walid Maalej

App stores allow users to give valuable feedback on apps, and developers to find this feedback and use it for the software evolution. However, finding user feedback that matches existing bug reports in issue trackers is challenging as users and developers often use a different language. In this work, we introduce DeepMatcher, an automatic approach using state-of-the-art deep learning methods to match problem reports in app reviews to bug reports in issue trackers. We evaluated DeepMatcher with four open-source apps quantitatively and qualitatively. On average, DeepMatcher achieved a hit ratio of 0.71 and a Mean Average Precision of 0.55. For 91 problem reports, DeepMatcher did not find any matching bug report. When manually analyzing these 91 problem reports and the issue trackers of the studied apps, we found that in 47 cases, users actually described a problem before developers discovered and documented it in the issue tracker. We discuss our findings and different use cases for DeepMatcher.

SEJun 7, 2018Code

A Simple NLP-based Approach to Support Onboarding and Retention in Open Source Communities

Christoph Stanik, Lloyd Montgomery, Daniel Martens et al.

Successful open source communities are constantly looking for new members and helping them become active developers. A common approach for developer onboarding in open source projects is to let newcomers focus on relevant yet easy-to-solve issues to familiarize themselves with the code and the community. The goal of this research is twofold. First, we aim at automatically identifying issues that newcomers can resolve by analyzing the history of resolved issues by simply using the title and description of issues. Second, we aim at automatically identifying issues, that can be resolved by newcomers who later become active developers. We mined the issue trackers of three large open source projects and extracted natural language features from the title and description of resolved issues. In a series of experiments, we optimized and compared the accuracy of four supervised classifiers to address our research goals. Random Forest, achieved up to 91% precision (F1-score 72%) towards the first goal while for the second goal, Decision Tree achieved a precision of 92% (F1-score 91%). A qualitative evaluation gave insights on what information in the issue description is helpful for newcomers. Our approach can be used to automatically identify, label, and recommend issues for newcomers in open source software projects based only on the text of the issues.

HCJan 17, 2025

How Do Programming Students Use Generative AI?

Christian Rahe, Walid Maalej

Programming students have a widespread access to powerful Generative AI tools like ChatGPT. While this can help understand the learning material and assist with exercises, educators are voicing more and more concerns about an overreliance on generated outputs and lack of critical thinking skills. It is thus important to understand how students actually use generative AI and what impact this could have on their learning behavior. To this end, we conducted a study including an exploratory experiment with 37 programming students, giving them monitored access to ChatGPT while solving a code authoring exercise. The task was not directly solvable by ChatGPT and required code comprehension and reasoning. While only 23 of the students actually opted to use the chatbot, the majority of those eventually prompted it to simply generate a full solution. We observed two prevalent usage strategies: to seek knowledge about general concepts and to directly generate solutions. Instead of using the bot to comprehend the code and their own mistakes, students often got trapped in a vicious cycle of submitting wrong generated code and then asking the bot for a fix. Those who self-reported using generative AI regularly were more likely to prompt the bot to generate a solution. Our findings indicate that concerns about potential decrease in programmers' agency and productivity with Generative AI are justified. We discuss how researchers and educators can respond to the potential risk of students uncritically over-relying on Generative AI. We also discuss potential modifications to our study design for large-scale replications.

SEApr 30, 2024

GUing: A Mobile GUI Search Engine using a Vision-Language Model

Jialiang Wei, Anne-Lise Courbis, Thomas Lambolais et al.

Graphical User Interfaces (GUIs) are central to app development projects. App developers may use the GUIs of other apps as a means of requirements refinement and rapid prototyping or as a source of inspiration for designing and improving their own apps. Recent research has thus suggested retrieving relevant GUI designs that match a certain text query from screenshot datasets acquired through crowdsourced or automated exploration of GUIs. However, such text-to-GUI retrieval approaches only leverage the textual information of the GUI elements, neglecting visual information such as icons or background images. In addition, retrieved screenshots are not steered by app developers and lack app features that require particular input data. To overcome these limitations, this paper proposes GUing, a GUI search engine based on a vision-language model called GUIClip, which we trained specifically for the problem of designing app GUIs. For this, we first collected from Google Play app introduction images which display the most representative screenshots and are often captioned (i.e.~labelled) by app vendors. Then, we developed an automated pipeline to classify, crop, and extract the captions from these images. This resulted in a large dataset which we share with this paper: including 303k app screenshots, out of which 135k have captions. We used this dataset to train a novel vision-language model, which is, to the best of our knowledge, the first of its kind for GUI retrieval. We evaluated our approach on various datasets from related work and in a manual experiment. The results demonstrate that our model outperforms previous approaches in text-to-GUI retrieval achieving a Recall@10 of up to 0.69 and a HIT@10 of 0.91. We also explored the performance of GUIClip for other GUI tasks including GUI classification and sketch-to-GUI retrieval with encouraging results.

HCJun 19, 2024

On AI-Inspired UI-Design

Jialiang Wei, Anne-Lise Courbis, Thomas Lambolais et al.

Graphical User Interface (or simply UI) is a primary mean of interaction between users and their devices. In this paper, we discuss three complementary Artificial Intelligence (AI) approaches for triggering the creativity of app designers and inspiring them create better and more diverse UI designs. First, designers can prompt a Large Language Model (LLM) to directly generate and adjust UIs. Second, a Vision-Language Model (VLM) enables designers to effectively search a large screenshot dataset, e.g. from apps published in app stores. Third, a Diffusion Model (DM) can be trained to specifically generate UIs as inspirational images. We present an AI-inspired design process and discuss the implications and limitations of the approaches.

SEJan 20, 2022

An Alternative Issue Tracking Dataset of Public Jira Repositories

Lloyd Montgomery, Clara Lüders, Walid Maalej

Organisations use issue tracking systems (ITSs) to track and document their projects' work in units called issues. This style of documentation encourages evolutionary refinement, as each issue can be independently improved, commented on, linked to other issues, and progressed through the organisational workflow. Commonly studied ITSs so far include GitHub, GitLab, and Bugzilla, while Jira, one of the most popular ITS in practice with a wealth of additional information, has yet to receive similar attention. Unfortunately, diverse public Jira datasets are rare, likely due to the difficulty in finding and accessing these repositories. With this paper, we release a dataset of 16 public Jiras with 1822 projects, spanning 2.7 million issues with a combined total of 32 million changes, 9 million comments, and 1 million issue links. We believe this Jira dataset will lead to many fruitful research projects investigating issue evolution, issue linking, cross-project analysis, as well as cross-tool analysis when combined with existing well-studied ITS datasets.

SEAug 19, 2021

Unsupervised Topic Discovery in User Comments

Christoph Stanik, Tim Pietz, Walid Maalej

On social media platforms like Twitter, users regularly share their opinions and comments with software vendors and service providers. Popular software products might get thousands of user comments per day. Research has shown that such comments contain valuable information for stakeholders, such as feature ideas, problem reports, or support inquiries. However, it is hard to manually manage and grasp a large amount of user comments, which can be redundant and of a different quality. Consequently, researchers suggested automated approaches to extract valuable comments, e.g., through problem report classifiers. However, these approaches do not aggregate semantically similar comments into specific aspects to provide insights like how often users reported a certain problem. We introduce an approach for automatically discovering topics composed of semantically similar user comments based on deep bidirectional natural language processing algorithms. Stakeholders can use our approach without the need to configure critical parameters like the number of clusters. We present our approach and report on a rigorous multiple-step empirical evaluation to assess how cohesive and meaningful the resulting clusters are. Each evaluation step was peer-coded and resulted in inter-coder agreements of up to 98%, giving us high confidence in the approach. We also report a thematic analysis on the topics discovered from tweets in the telecommunication domain.

SEAug 12, 2021

Lessons Learned from Customizing and Applying ACTA to Design a Novel Device for Emergency Medical Care

Christoph Stanik, Tim Puhlfürß, Anne Mahler et al.

Preclinical patient care is both mentally and physically challenging and exhausting for emergency teams. The teams intensively use medical technology to help the patient on site. However, they must carry and handle multiple heavy medical devices such as a monitor for the patient's vital signs, a ventilator to support an unconscious patient, and a resuscitation device. In an industry project, we aim at developing a combined device that lowers the emergency teams' mental and physical load caused by multiple screens, devices, and their high weight. The focus of this paper is to describe our ideation and requirements elicitation process regarding the user interface design of the combined device. For one year, we applied a fully digital customized version of the Applied Cognitive Task Analysis (ACTA) method to systematically elicit the requirements. Domain and requirements engineering experts created a detailed hierarchical task diagram of an extensive emergency scenario, conducted eleven interviews with subject matter experts (SMEs), and executed two design workshops, which led to 34 sketches and three mockups of the combined device's user interface. Cross-functional teams accompanied the entire process and brought together expertise in preclinical patient care, requirements engineering, and medical product development. We report on the lessons learned for each of the four consecutive stages of our customized ACTA process.

SEOct 26, 2020

Renovating Requirements Engineering: First Thoughts to Shape Requirements Engineering as a Profession

Yen Dieu Pham, Lloyd Montgomery, Walid Maalej

Legacy software systems typically include vital data for organizations that use them and should thus to be regularly maintained. Ideally, organizations should rely on Requirements Engineers to understand and manage changes of stakeholder needs and system constraints. However, due to time and cost pressure, and with a heavy focus on implementation, organizations often choose to forgo Requirements Engineers and rather focus on ad-hoc bug fixing and maintenance. This position paper discusses what Requirements Engineers could possibly learn from other similar roles to become crucial for the evolution of legacy systems. Particularly, we compare the roles of Requirements Engineers (according to IREB), Building Architects (according to the German regulations), and Product Owners (according to "The Scrum-Guide"). We discuss overlaps along four dimensions: liability, self-portrayal, core activities, and artifacts. Finally we draw insights from these related fields to foster the concept of a Requirements Engineer as a distinguished profession.

SESep 17, 2019

OpenReq Issue Link Map: A Tool to Visualize Issue Links in Jira

Clara Marie Lüders, Mikko Raatikainen, Joaquim Motger et al.

Managing software projects gets more and more complicated with an increasing project and product size. To cope with this complexity, many organizations use issue tracking systems, where tasks, bugs, and requirements are stored as issues. Unfortunately, managing software projects might remain chaotic even when using issue trackers. Particularly for long lasting projects with a large number of issues and links between them, it is often hard to maintain an overview of the dependencies, especially when dozens of new issues get reported every day. We present a Jira plug-in that supports developers, project managers, and product owners in managing and overviewing issues and their dependencies. Our tool visualizes the issue links, helps to find missing or unknown links between issues, and detects inconsistencies.

SESep 12, 2019

Requirements Intelligence with OpenReq Analytics

Christoph Stanik, Walid Maalej

With the rise of social media like Twitter and distribution platforms like app stores, users have various ways to express their opinions about software products. Popular software vendors get user feedback thousandfold per day. Research has shown that such feedback contains valuable information for software development teams. However, a manual analysis of user feedback is cumbersome and hard to manage. We present OpenReq Analytics, a software requirements intelligence service, that collects, processes, analyzes, and visualizes user feedback.

CLSep 12, 2019

Classifying Multilingual User Feedback using Traditional Machine Learning and Deep Learning

Christoph Stanik, Marlo Haering, Walid Maalej

With the rise of social media like Twitter and of software distribution platforms like app stores, users got various ways to express their opinion about software products. Popular software vendors get user feedback thousandfold per day. Research has shown that such feedback contains valuable information for software development teams such as problem reports or feature and support inquires. Since the manual analysis of user feedback is cumbersome and hard to manage many researchers and tool vendors suggested to use automated analyses based on traditional supervised machine learning approaches. In this work, we compare the results of traditional machine learning and deep learning in classifying user feedback in English and Italian into problem reports, inquiries, and irrelevant. Our results show that using traditional machine learning, we can still achieve comparable results to deep learning, although we collected thousands of labels.

SEJul 31, 2019

Extracting and Analyzing Context Information in User-Support Conversations on Twitter

Daniel Martens, Walid Maalej

While many apps include built-in options to report bugs or request features, users still provide an increasing amount of feedback via social media, like Twitter. Compared to traditional issue trackers, the reporting process in social media is unstructured and the feedback often lacks basic context information, such as the app version or the device concerned when experiencing the issue. To make this feedback actionable to developers, support teams engage in recurring, effortful conversations with app users to clarify missing context items. This paper introduces a simple approach that accurately extracts basic context information from unstructured, informal user feedback on mobile apps, including the platform, device, app version, and system version. Evaluated against a truthset of 3014 tweets from official Twitter support accounts of the 3 popular apps Netflix, Snapchat, and Spotify, our approach achieved precisions from 81% to 99% and recalls from 86% to 98% for the different context item types. Combined with a chatbot that automatically requests missing context items from reporting users, our approach aims at auto-populating issue trackers with structured bug reports.

SEJul 23, 2019

On Using Machine Learning to Identify Knowledge in API Reference Documentation

Davide Fucci, Alireza Mollaalizadehbahnemiri, Walid Maalej

Using API reference documentation like JavaDoc is an integral part of software development. Previous research introduced a grounded taxonomy that organizes API documentation knowledge in 12 types, including knowledge about the Functionality, Structure, and Quality of an API. We study how well modern text classification approaches can automatically identify documentation containing specific knowledge types. We compared conventional machine learning (k-NN and SVM) and deep learning approaches trained on manually annotated Java and .NET API documentation (n = 5,574). When classifying the knowledge types individually (i.e., multiple binary classifiers) the best AUPRC was up to 87%. The deep learning and SVM classifiers seem complementary. For four knowledge types (Concept, Control, Pattern, and Non-Information), SVM clearly outperforms deep learning which, on the other hand, is more accurate for identifying the remaining types. When considering multiple knowledge types at once (i.e., multi-label classification) deep learning outperforms naïve baselines and traditional machine learning achieving a MacroAUC up to 79%. We also compared classifiers using embeddings pre-trained on generic text corpora and StackOverflow but did not observe significant improvements. Finally, to assess the generalizability of the classifiers, we re-tested them on a different, unseen Python documentation dataset. Classifiers for Functionality, Concept, Purpose, Pattern, and Directive seem to generalize from Java and .NET to Python documentation. The accuracy related to the remaining types seems API-specific. We discuss our results and how they inform the development of tools for supporting developers sharing and accessing API knowledge. Published article: https://doi.org/10.1145/3338906.3338943

SEJun 14, 2019

Release early, release often, and watch your users' emotions

Daniel Martens, Walid Maalej

App stores are highly competitive markets, sometimes offering dozens of apps for a single use case. Unexpected app changes such as a feature removal might incite even loyal users to explore alternative apps. Sentiment analysis tools can help monitor users' emotions expressed, e.g., in app reviews or tweets. We found that these emotions include four recurring patterns corresponding to the app releases. Based on these patterns and online reports about popular apps, we derived five release lessons to assist app vendors maintain positive emotions and gain competitive advantages.

IRApr 11, 2019

Towards Understanding and Detecting Fake Reviews in App Stores

Daniel Martens, Walid Maalej

App stores include an increasing amount of user feedback in form of app ratings and reviews. Research and recently also tool vendors have proposed analytics and data mining solutions to leverage this feedback to developers and analysts, e.g., for supporting release decisions. Research also showed that positive feedback improves apps' downloads and sales figures and thus their success. As a side effect, a market for fake, incentivized app reviews emerged with yet unclear consequences for developers, app users, and app store operators. This paper studies fake reviews, their providers, characteristics, and how well they can be automatically detected. We conducted disguised questionnaires with 43 fake review providers and studied their review policies to understand their strategies and offers. By comparing 60,000 fake reviews with 62 million reviews from the Apple App Store we found significant differences, e.g., between the corresponding apps, reviewers, rating distribution, and frequency. This inspired the development of a simple classifier to automatically detect fake reviews in app stores. On a labelled and imbalanced dataset including one-tenth of fake reviews, as reported in other domains, our classifier achieved a recall of 91% and an AUC/ROC value of 98%. We discuss our findings and their impact on software engineering, app users, and app store operators.

CYOct 2, 2018

Who is Addressed in this Comment? Automatically Classifying Meta-Comments in News Comments

Marlo Häring, Wiebke Loosen, Walid Maalej

User comments have become an essential part of online journalism. However, newsrooms are often overwhelmed by the vast number of diverse comments, for which a manual analysis is barely feasible. Identifying meta-comments that address or mention newsrooms, individual journalists, or moderators and that may call for reactions is particularly critical. In this paper, we present an automated approach to identify and classify meta-comments. We compare comment classification based on manually extracted features with an end-to-end learning approach. We develop, optimize, and evaluate multiple classifiers on a comment dataset of the large German online newsroom SPIEGEL Online and the 'One Million Posts' corpus of DER STANDARD, an Austrian newspaper. Both optimized classification approaches achieved encouraging $F_{0.5}$ values between 76% and 91%. We report on the most significant classification features with the results of a qualitative analysis and discuss how our work contributes to making participation in online journalism more constructive.

SEAug 7, 2018

Needs and Challenges for a Platform to Support Large-scale Requirements Engineering. A Multiple Case Study

Davide Fucci, Cristina Palomares, Dolors Costal et al.

Background: Requirement engineering is often considered a critical activity in system development projects. The increasing complexity of software, as well as number and heterogeneity of stakeholders, motivate the development of methods and tools for improving large-scale requirement engineering. Aims: The empirical study presented in this paper aims to identify and understand the characteristics and challenges of a platform, as desired by experts, to support requirement engineering for individual stakeholders, based on the current pain-points of their organizations when dealing with a large number requirements. Method: We conducted a multiple case study with three companies in different domains. We collected data through ten semi-structured interviews with experts from these companies. Results: The main pain-point for stakeholders is handling the vast amount of data from different sources. The foreseen platform should leverage such data to manage changes in requirements according to customers' and users' preferences. It should also offer stakeholders an estimation of how long a requirements engineering task will take to complete, along with an easier requirements dependency identification and requirements reuse strategy. Conclusions: The findings provide empirical evidence about how practitioners wish to improve their requirement engineering processes and tools. The insights are a starting point for in-depth investigations into the problems and solutions presented. Practitioners can use the results to improve existing or design new practices and tools.

SEJul 2, 2018

App Store 2.0: From Crowd Information to Actionable Feedback in Mobile Ecosystems

María Gómez, Bram Adams, Walid Maalej et al.

Given the increasing competition in mobile app ecosystems, improving the experience of users has become a major goal for app vendors. This article introduces a visionary app store, called APP STORE 2.0, which exploits crowdsourced information about apps, devices and users to increase the overall quality of the delivered mobile apps. We sketch a blueprint architecture of the envisioned app stores and discuss the different kinds of actionable feedbacks that app stores can generate using crowdsourced information.

CYMar 28, 2018

A First Implementation of a Design Thinking Workshop During a Mobile App Development Project Course

Yen Dieu Pham, Davide Fucci, Walid Maalej

Due to their characteristics, millennials prefer learning-by-doing and social learning, such as project-based learning. However, software development projects require not only technical skills but also creativity; Design Thinking can serve such purpose. We conducted a workshop following the Design Thinking approach of the d.school, to help students generating ideas for a mobile app development project course. On top of the details for implementing the workshop, we report our observations, lessons learned, and provide suggestions for further implementation.

SEJul 27, 2017

Find, Understand, and Extend Development Screencasts on YouTube

Mathias Ellmann, Alexander Oeser, Davide Fucci et al.

A software development screencast is a video that captures the screen of a developer working on a particular task while explaining its implementation details. Due to the increased popularity of software development screencasts (e.g., available on YouTube), we study how and to what extent they can be used as additional source of knowledge to answer developer's questions about, for example, the use of a specific API. We first differentiate between development and other types of screencasts using video frame analysis. By using the Cosine algorithm, developers can expect ten development screencasts in the top 20 out of 100 different YouTube videos. We then extracted popular development topics on which screencasts are reporting on YouTube: database operations, system set-up, plug-in development, game development, and testing. Besides, we found six recurring tasks performed in development screencasts, such as object usage and UI operations. Finally, we conducted a similarity analysis by considering only the spoken words (i.e., the screencast transcripts but not the text that might appear in a scene) to link API documents, such as the Javadoc, to the appropriate screencasts. By using Cosine similarity, we identified 38 relevant documents in the top 20 out of 9455 API documents.