Mojtaba Shahin

h-index24

37papers

1,389citations

Novelty23%

AI Score51

Ranked #38,903 of 201,326 authors (top 19%)#369 in SE (top 11%)

37 Papers

SESep 25, 2024Code

Demystifying Issues, Causes and Solutions in LLM Open-Source Projects

Yangxiao Cai, Peng Liang, Yifei Wang et al.

With the advancements of Large Language Models (LLMs), an increasing number of open-source software projects are using LLMs as their core functional component. Although research and practice on LLMs are capturing considerable interest, no dedicated studies explored the challenges faced by practitioners of LLM open-source projects, the causes of these challenges, and potential solutions. To fill this research gap, we conducted an empirical study to understand the issues that practitioners encounter when developing and using LLM open-source software, the possible causes of these issues, and potential solutions. We collected all closed issues from 15 LLM open-source projects and labelled issues that met our requirements. We then randomly selected 994 issues from the labelled issues as the sample for data extraction and analysis to understand the prevalent issues, their underlying causes, and potential solutions. Our study results show that (1) Model Issue is the most common issue faced by practitioners, (2) Model Problem, Configuration and Connection Problem, and Feature and Method Problem are identified as the most frequent causes of the issues, and (3) Optimize Model is the predominant solution to the issues. Based on the study results, we provide implications for practitioners and researchers of LLM open-source projects.

87.2SEApr 7Code

Beyond Functional Correctness: Design Issues in AI IDE-Generated Large-Scale Projects

Syed Mohammad Kashif, Ruiyin Li, Peng Liang et al.

New generation of AI coding tools, including AI-powered IDEs equipped with agentic capabilities, can generate code within the context of the project. These AI IDEs are increasingly perceived as capable of producing project-level code at scale. However, there is limited empirical evidence on the extent to which they can generate large-scale software systems and what design issues such systems may exhibit. To address this gap, we conducted a study to explore the capability of Cursor in generating large-scale projects and to evaluate the design quality of projects generated by Cursor. First, we propose a Feature-Driven Human-In-The-Loop (FD-HITL) framework that systematically guides project generation from curated project descriptions. We generated 10 projects using Cursor with the FD-HITL framework across three application domains and multiple technologies. We assessed the functional correctness of these projects through manual evaluation, obtaining an average functional correctness score of 91%. Next, we analyzed the generated projects using two static analysis tools, CodeScene and SonarQube, to detect design issues. We identified 1,305 design issues categorized into 9 categories by CodeScene and 3,193 issues in 11 categories by SonarQube. Our findings show that (1) when used with the FD-HITL framework, Cursor can generate functional large-scale projects averaging 16,965 LoC and 114 files; (2) the generated projects nevertheless contain design issues that may pose long-term maintainability and evolvability risks, requiring careful review by experienced developers; (3) the most prevalent issues include Code Duplication, high Code Complexity, Large Methods, Framework Best-Practice Violations, Exception-Handling Issues and Accessibility Issues; (4) these design issues violate design principles such as SRP, SoC, and DRY. The replication package is at https://github.com/Kashifraz/DIinAGP

SEDec 28, 2025Code

FasterPy: An LLM-based Code Execution Efficiency Optimization Framework

Yue Wu, Minghao Han, Ruiyin Li et al.

Code often suffers from performance bugs. These bugs necessitate the research and practice of code optimization. Traditional rule-based methods rely on manually designing and maintaining rules for specific performance bugs (e.g., redundant loops, repeated computations), making them labor-intensive and limited in applicability. In recent years, machine learning and deep learning-based methods have emerged as promising alternatives by learning optimization heuristics from annotated code corpora and performance measurements. However, these approaches usually depend on specific program representations and meticulously crafted training datasets, making them costly to develop and difficult to scale. With the booming of Large Language Models (LLMs), their remarkable capabilities in code generation have opened new avenues for automated code optimization. In this work, we proposed FasterPy, a low-cost and efficient framework that adapts LLMs to optimize the execution efficiency of Python code. FasterPy combines Retrieval-Augmented Generation (RAG), supported by a knowledge base constructed from existing performance-improving code pairs and corresponding performance measurements, with Low-Rank Adaptation (LoRA) to enhance code optimization performance. Our experimental results on the Performance Improving Code Edits (PIE) benchmark demonstrate that our method outperforms existing models on multiple metrics. The FasterPy tool and the experimental results are available at https://github.com/WuYue22/fasterpy.

83.0SEMay 2Code

Using LLMs in Software Design: An Empirical Study of GitHub and A Practitioner Survey

Yifei Wang, Ruiyin Li, Peng Liang et al.

Recent advancements in Large Language Models (LLMs) have demonstrated significant potential across a wide range of software engineering tasks, including software design, an area traditionally regarded as highly dependent on human expertise and judgment. However, there has been little research focusing on how LLMs are used in software design, nor on the associated benefits and drawbacks. This paper aims to bridge this gap by empirically investigating how software developers utilize LLMs in the context of software design. We conduct a mixed-methods study, combining a mining study of 291 developer-ChatGPT conversations shared on GitHub with a survey of 65 software practitioners. Our findings reveal nine distinct categories of design tasks supported by ChatGPT, including architecture design, data model design, and the use of design patterns. We further characterize developer-ChatGPT interactions, showing that developers primarily use ChatGPT for knowledge acquisition and design-related code generation, with most tasks situated at the detailed design level. The study identifies seven key benefits of utilizing LLMs in software design as perceived by developers, such as better technology selection and the early detection of design flaws. We also uncover six limitations, including the generation of overly lengthy and difficult-to-read outputs, the creation of inexecutable or incorrect code, and a heavy reliance on context that can lead to hallucinated results. These findings provide an evidence-based characterization of current LLM use in software design from both open-source and practitioner perspectives, highlighting a tension between perceived benefits and limitations, which lays a foundation for future research and the development of effective techniques and tools to integrate LLMs into software design practices.

14.6SEMay 6

Engineering for Crisis Management: A User-Centred Analysis of Disaster Mobile Applications

Muhamad Syukron, Anuradha Madugalla, Mojtaba Shahin et al.

Disaster mobile apps play an increasingly important role in disseminating hazard information and supporting communities during emergency situations. This study presents a comprehensive analysis of these mobile applications, focusing on their features, user-reported challenges, and opportunities for improvement. We first examined the landscape of disaster mobile apps by analysing 70 apps identified through a combination of methods, including those from the literature, the Google Play Store, and the App Store. The analysis categorised apps based on disaster focus, geographic coverage, popularity, monetisation strategies, and features across the disaster lifecycle. We then extracted, translated and analysed user reviews using topic modelling and sentiment analysis to identify key concerns and recurring issues. The results show that most applications prioritise response-related functionalities, with limited support for preparedness and recovery. User feedback highlights critical challenges related to technical reliability, usability, accessibility, and information clarity. Based on these findings, we propose a set of recommendations for developers and emergency management agencies to improve the reliability, inclusiveness, and overall effectiveness of disaster mobile apps. These include adopting lifecycle-oriented design approaches, strengthening multilingual support, improving technical robustness, and integrating user feedback into development processes. This work contributes to the growing body of research on human-centred disaster risk reduction by providing empirical insights and actionable guidance for the design of more reliable and inclusive disaster communication systems.

SENov 11, 2025

Designing LLM-based Multi-Agent Systems for Software Engineering Tasks: Quality Attributes, Design Patterns and Rationale

Yangxiao Cai, Ruiyin Li, Peng Liang et al.

As the complexity of Software Engineering (SE) tasks continues to escalate, Multi-Agent Systems (MASs) have emerged as a focal point of research and practice due to their autonomy and scalability. Furthermore, through leveraging the reasoning and planning capabilities of Large Language Models (LLMs), the application of LLM-based MASs in the field of SE is garnering increasing attention. However, there is no dedicated study that systematically explores the design of LLM-based MASs, including the Quality Attributes (QAs) on which the designers mainly focus, the design patterns used by the designers, and the rationale guiding the design of LLM-based MASs for SE tasks. To this end, we conducted a study to identify the QAs that LLM-based MASs for SE tasks focus on, the design patterns used in the MASs, and the design rationale for the MASs. We collected 94 papers on LLM-based MASs for SE tasks as the source. Our study shows that: (1) Code Generation is the most common SE task solved by LLM-based MASs among ten identified SE tasks, (2) Functional Suitability is the QA on which designers of LLM-based MASs pay the most attention, (3) Role-Based Cooperation is the design pattern most frequently employed among 16 patterns used to construct LLM-based MASs, and (4) Improving the Quality of Generated Code is the most common rationale behind the design of LLM-based MASs. Based on the study results, we presented the implications for the design of LLM-based MASs to support SE tasks.

SEJan 29

Age Matters: Analyzing Age-Related Discussions in App Reviews

Shashiwadana Nirmania, Garima Sharma, Hourieh Khalajzadeh et al.

In recent years, mobile applications have become indispensable tools for managing various aspects of life. From enhancing productivity to providing personalized entertainment, mobile apps have revolutionized people's daily routines. Despite this rapid growth and popularity, gaps remain in how these apps address the needs of users from different age groups. Users of varying ages face distinct challenges when interacting with mobile apps, from younger users dealing with inappropriate content to older users having difficulty with usability due to age-related vision and cognition impairments. Although there have been initiatives to create age-inclusive apps, a limited understanding of user perspectives on age-related issues may hinder developers from recognizing specific challenges and implementing effective solutions. In this study, we explore age discussions in app reviews to gain insights into how mobile apps should cater to users across different age groups.We manually curated a dataset of 4,163 app reviews from the Google Play Store and identified 1,429 age-related reviews and 2,734 non-age-related reviews. We employed eight machine learning, deep learning, and large language models to automatically detect age discussions, with RoBERTa performing the best, achieving a precision of 92.46%. Additionally, a qualitative analysis of the 1,429 age-related reviews uncovers six dominant themes reflecting user concerns.

3.2SEMar 23

One-Year Internship Program on Software Engineering: Students' Perceptions and Educators' Lessons Learned

Golnoush Abaei, Mojtaba Shahin, Maria Spichkova

The inclusion of internship courses in Software Engineering (SE) programs is essential for closing knowledge gaps and improving graduates' readiness for the software industry. Our study focuses on year-long internships at RMIT University (Melbourne, Australia), which offers in-depth industry engagement. We analysed how the course evolved over the last 10 years to incorporate students' needs and summarised the lessons learned that can be helpful for other educators supporting internship courses. Our qualitative analysis of internship data based on 91 reports during 2023-2024 identified three challenge themes the students faced, and which courses were found by students to be particularly beneficial during their internships. On this basis, we proposed recommendations for educators and companies to help interns overcome challenges and maximise their learning experience.

SEOct 24, 2025Code

ArchISMiner: A Framework for Automatic Mining of Architectural Issue-Solution Pairs from Online Developer Communities

Musengamana Jean de Dieu, Ruiyin Li, Peng Liang et al.

Stack Overflow (SO), a leading online community forum, is a rich source of software development knowledge. However, locating architectural knowledge, such as architectural solutions remains challenging due to the overwhelming volume of unstructured content and fragmented discussions. Developers must manually sift through posts to find relevant architectural insights, which is time-consuming and error-prone. This study introduces ArchISMiner, a framework for mining architectural knowledge from SO. The framework comprises two complementary components: ArchPI and ArchISPE. ArchPI trains and evaluates multiple models, including conventional ML/DL models, Pre-trained Language Models (PLMs), and Large Language Models (LLMs), and selects the best-performing model to automatically identify Architecture-Related Posts (ARPs) among programming-related discussions. ArchISPE employs an indirect supervised approach that leverages diverse features, including BERT embeddings and local TextCNN features, to extract architectural issue-solution pairs. Our evaluation shows that the best model in ArchPI achieves an F1-score of 0.960 in ARP detection, and ArchISPE outperforms baselines in both SE and NLP fields, achieving F1-scores of 0.883 for architectural issues and 0.894 for solutions. A user study further validated the quality (e.g., relevance and usefulness) of the identified ARPs and the extracted issue-solution pairs. Moreover, we applied ArchISMiner to three additional forums, releasing a dataset of over 18K architectural issue-solution pairs. Overall, ArchISMiner can help architects and developers identify ARPs and extract succinct, relevant, and useful architectural knowledge from developer communities more accurately and efficiently. The replication package of this study has been provided at https://github.com/JeanMusenga/ArchISPE

SEDec 30, 2021Code

An Empirical Study of Security Practices for Microservices Systems

Ali Rezaei Nasab, Mojtaba Shahin, Seyed Ali Hoseyni Raviz et al.

Despite the numerous benefits of microservices systems, security has been a critical issue in such systems. Several factors explain this difficulty, including a knowledge gap among microservices practitioners on properly securing a microservices system. To (partially) bridge this gap, we conducted an empirical study. We first manually analyzed 861 microservices security points, including 567 issues, 9 documents, and 3 wiki pages from 10 GitHub open-source microservices systems and 306 Stack Overflow posts concerning security in microservices systems. In this study, a microservices security point is referred to as "a GitHub issue, a Stack Overflow post, a document, or a wiki page that entails 5 or more microservices security paragraphs". Our analysis led to a catalog of 28 microservices security practices. We then ran a survey with 74 microservices practitioners to evaluate the usefulness of these 28 practices. Our findings demonstrate that the survey respondents affirmed the usefulness of the 28 practices. We believe that the catalog of microservices security practices can serve as a valuable resource for microservices practitioners to more effectively address security issues in microservices systems. It can also inform the research community of the required or less explored areas to develop microservices-specific security practices and tools.

SESep 20, 2021Code

Pandemic Software Development: The Student Experiences from Developing a COVID-19 Information Dashboard

Benjamin Koh, Mojtaba Shahin, Annette Ong et al.

The COVID-19 pandemic has birthed a wealth of information through many publicly accessible sources, such as news outlets and social media. However, gathering and understanding the content can be difficult due to inaccuracies or inconsistencies between the different sources. To alleviate this challenge in Australia, a team of 48 student volunteers developed an open-source COVID-19 information dashboard to provide accurate, reliable, and real-time COVID-19 information for Australians. The students developed this software while working under legislative restrictions that required social isolation. The goal of this study is to characterize the experiences of the students throughout the project. We conducted an online survey completed by 39 of the volunteering students contributing to the COVID-19 dashboard project. Our results indicate that playing a positive role in the COVID-19 crisis and learning new skills and technologies were the most cited motivating factors for the students to participate in the project. While working on the project, some students struggled to maintain a work-life balance due to working from home. However, the students generally did not express strong sentiment towards general project challenges. The students expressed more strongly that data collection was a significant challenge as it was difficult to collect reliable, accurate, and up-to-date data from various government sources. The students have been able to mitigate these challenges by establishing a systematic data collection process in the team, leveraging frequent and clear communication through text, and appreciating and encouraging each other's efforts. By participating in the project, the students boosted their technical (e.g., front-end development) and non-technical (e.g., task prioritization) skills. Our study discusses several implications for students, educators, and policymakers.

SEApr 25, 2021Code

On the Nature of Issues in Five Open Source Microservices Systems: An Empirical Study

Muhammad Waseem, Peng Liang, Mojtaba Shahin et al.

Due to its enormous benefits, the research and industry communities have shown an increasing interest in the Microservices Architecture (MSA) style over the last few years. Despite this, there is a limited evidence-based and thorough understanding of the types of issues (e.g., faults, errors, failures, mistakes) faced by microservices system developers and causes that trigger the issues. Such evidence-based understanding of issues and causes is vital for long-term, impactful, and quality research and practice in the MSA style. To that end, we conducted an empirical study on 1,345 issue discussions extracted from five open source microservices systems hosted on GitHub. Our analysis led to the first of its kind taxonomy of the types of issues in open source microservices systems, informing that the problems originating from Technical debt (321, 23.86%), Build (145, 10.78%), Security (137, 10.18%), and Service execution and communication (119, 8.84%) are prominent. We identified that "General programming errors", "Poor security management", "Invalid configuration and communication", and "Legacy versions, compatibility and dependency" are the predominant causes for the leading four issue categories. Study results streamline a taxonomy of issues, their mapping with underlying causes, and present empirical findings that could facilitate research and development on emerging and next-generation microservices systems.

SEJan 29, 2024

An Insight into Security Code Review with LLMs: Capabilities, Obstacles, and Influential Factors

Jiaxin Yu, Peng Liang, Yujia Fu et al.

Security code review is a time-consuming and labor-intensive process typically requiring integration with automated security defect detection tools. However, existing security analysis tools struggle with poor generalization, high false positive rates, and coarse detection granularity. Large Language Models (LLMs) have been considered promising candidates for addressing those challenges. In this study, we conducted an empirical study to explore the potential of LLMs in detecting security defects during code review. Specifically, we evaluated the performance of six LLMs under five different prompts and compared them with state-of-the-art static analysis tools. We also performed linguistic and regression analyses for the best-performing LLM to identify quality problems in its responses and factors influencing its performance. Our findings showthat: (1) existing pre-trained LLMs have limited capability in security code review but significantly outperformthe state-of-the-art static analysis tools. (2) GPT-4 performs best among all LLMs when provided with a CWE list for reference. (3) GPT-4 frequently generates verbose or non-compliant responses with the task requirements given in the prompts. (4) GPT-4 is more adept at identifying security defects in code files with fewer tokens, containing functional logic, or written by developers with less involvement in the project.

SEApr 29, 2025

Using LLMs in Generating Design Rationale for Software Architecture Decisions

Xiyu Zhou, Ruiyin Li, Peng Liang et al.

Design Rationale (DR) for software architecture decisions refers to the reasoning underlying architectural choices, which provides valuable insights into the different phases of the architecting process throughout software development. However, in practice, DR is often inadequately documented due to a lack of motivation and effort from developers. With the recent advancements in Large Language Models (LLMs), their capabilities in text comprehension, reasoning, and generation may enable the generation and recovery of DR for architecture decisions. In this study, we evaluated the performance of LLMs in generating DR for architecture decisions. First, we collected 50 Stack Overflow (SO) posts, 25 GitHub issues, and 25 GitHub discussions related to architecture decisions to construct a dataset of 100 architecture-related problems. Then, we selected five LLMs to generate DR for the architecture decisions with three prompting strategies, including zero-shot, chain of thought (CoT), and LLM-based agents. With the DR provided by human experts as ground truth, the Precision of LLM-generated DR with the three prompting strategies ranges from 0.267 to 0.278, Recall from 0.627 to 0.715, and F1-score from 0.351 to 0.389. Additionally, 64.45% to 69.42% of the arguments of DR not mentioned by human experts are also helpful, 4.12% to 4.87% of the arguments have uncertain correctness, and 1.59% to 3.24% of the arguments are potentially misleading. To further understand the trustworthiness and applicability of LLM-generated DR in practice, we conducted semi-structured interviews with six practitioners. Based on the experimental and interview results, we discussed the pros and cons of the three prompting strategies, the strengths and limitations of LLM-generated DR, and the implications for the practical use of LLM-generated DR.

SEJan 16, 2024

Fairness Concerns in App Reviews: A Study on AI-based Mobile Apps

Ali Rezaei Nasab, Maedeh Dashti, Mojtaba Shahin et al.

Fairness is one of the socio-technical concerns that must be addressed in software systems. Considering the popularity of mobile software applications (apps) among a wide range of individuals worldwide, mobile apps with unfair behaviors and outcomes can affect a significant proportion of the global population, potentially more than any other type of software system. Users express a wide range of socio-technical concerns in mobile app reviews. This research aims to investigate fairness concerns raised in mobile app reviews. Our research focuses on AI-based mobile app reviews as the chance of unfair behaviors and outcomes in AI-based mobile apps may be higher than in non-AI-based apps. To this end, we first manually constructed a ground-truth dataset, including 1,132 fairness and 1,473 non-fairness reviews. Leveraging the ground-truth dataset, we developed and evaluated a set of machine learning and deep learning models that distinguish fairness reviews from non-fairness reviews. Our experiments show that our best-performing model can detect fairness reviews with a precision of 94%. We then applied the best-performing model on approximately 9.5M reviews collected from 108 AI-based apps and identified around 92K fairness reviews. Next, applying the K-means clustering technique to the 92K fairness reviews, followed by manual analysis, led to the identification of six distinct types of fairness concerns (e.g., 'receiving different quality of features and services in different platforms and devices' and 'lack of transparency and fairness in dealing with user-generated content'). Finally, the manual analysis of 2,248 app owners' responses to the fairness reviews identified six root causes (e.g., 'copyright issues') that app owners report to justify fairness concerns.

SEJan 15, 2022

How are Diverse End-user Human-centric Issues Discussed on GitHub?

Hourieh Khalajzadeh, Mojtaba Shahin, Humphrey O. Obie et al.

Many software systems fail to meet the needs of the diverse end-users in society and are prone to pose problems, such as accessibility and usability issues. Some of these problems (partially) stem from the failure to consider the characteristics, limitations, and abilities of diverse end-users during software development. We refer to this class of problems as human-centric issues. Despite their importance, there is a limited understanding of the types of human-centric issues encountered by developers. In-depth knowledge of these human-centric issues is needed to design software systems that better meet their diverse end-users' needs. This paper aims to provide insights for the software development and research communities on which human-centric issues are a topic of discussion for developers on GitHub. We conducted an empirical study by extracting and manually analysing 1,691 issue comments from 12 diverse projects, ranging from small to large-scale projects, including projects designed for challenged end-users, e.g., visually impaired and dyslexic users. Our analysis shows that eight categories of human-centric issues are discussed by developers. These include Inclusiveness, Privacy & Security, Compatibility, Location & Language, Preference, Satisfaction, Emotional Aspects, and Accessibility. Guided by our findings, we highlight some implications and possible future paths to further understand and incorporate human-centric issues in software development to be able to design software that meets the needs of diverse end users in society.

SEJan 15, 2022

Decision Models for Selecting Patterns and Strategies in Microservices Systems and their Evaluation by Practitioners

Muhammad Waseem, Peng Liang, Aakash Ahmad et al.

Researchers and practitioners have recently proposed many Microservices Architecture (MSA) patterns and strategies covering various aspects of microservices system life cycle, such as service design and security. However, selecting and implementing these patterns and strategies can entail various challenges for microservices practitioners. To this end, this study proposes decision models for selecting patterns and strategies covering four MSA design areas: application decomposition into microservices, microservices security, microservices communication, and service discovery. We used peer-reviewed and grey literature to identify the patterns, strategies, and quality attributes for creating these decision models. To evaluate the familiarity, understandability, completeness, and usefulness of the decision models, we conducted semi-structured interviews with 24 microservices practitioners from 12 countries across five continents. Our evaluation results show that the practitioners found the decision models as an effective guide to select microservices patterns and strategies.

SEDec 21, 2021

How Do Developers Search for Architectural Information? An Industrial Survey

Musengamana Jean de Dieu, Peng Liang, Mojtaba Shahin

Building software systems often requires knowledge and skills beyond what developers already possess. In such cases, developers have to leverage different sources of information to seek help. A growing number of researchers and practitioners have started investigating what programming-related information developers seek during software development. However, being a high level and a type of the most important development-related information, architectural information search activity is seldom explored. To fill this gap, we conducted an industrial survey completed by 103 participants to understand how developers search for architectural information to solve their architectural problems in development. Our main findings are: (1) searching for architectural information to learn about the pros and cons of certain architectural solutions (e.g., patterns, tactics) and to make an architecture decision among multiple choices are the most frequent purposes or tasks; (2) developers find difficulties mostly in getting relevant architectural information for addressing quality concerns and making design decisions among multiple choices when seeking architectural information; (3) taking too much time to go through architectural information retrieved from various sources and feeling overwhelmed due to the dispersion and abundance of architectural information in various sources are the top two major challenges developers face when searching for architectural information. Our findings (1) provide researchers with future directions, such as the design and development of approaches and tools for searching architectural information from multiple sources, and (2) can be used to provide guidelines for practitioners to refer to when seeking architectural information and providing architectural information that could be considered useful.

SENov 30, 2021

The Impact of Considering Human Values during Requirements Engineering Activities

Harsha Perera, Rashina Hoda, Rifat Ara Shams et al.

Human values, or what people hold important in their life, such as freedom, fairness, and social responsibility, often remain unnoticed and unattended during software development. Ignoring values can lead to values violations in software that can result in financial losses, reputation damage, and widespread social and legal implications. However, embedding human values in software is not only non-trivial but also generally an unclear process. Commencing as early as during the Requirements Engineering (RE) activities promises to ensure fit-for-purpose and quality software products that adhere to human values. But what is the impact of considering human values explicitly during early RE activities? To answer this question, we conducted a scenario-based survey where 56 software practitioners contextualised requirements analysis towards a proposed mobile application for the homeless and suggested values-laden software features accordingly. The suggested features were qualitatively analysed. Results show that explicit considerations of values can help practitioners identify applicable values, associate purpose with the features they develop, think outside-the-box, and build connections between software features and human values. Finally, drawing from the results and experiences of this study, we propose a scenario-based values elicitation process -- a simple four-step takeaway as a practical implication of this study.

SEOct 11, 2021

Human Values in Mobile App Development: An Empirical Study on Bangladeshi Agriculture Mobile Apps

Rifat Ara Shams, Mojtaba Shahin, Gillian Oliver et al.

Given the ubiquity of mobile applications (apps) in daily lives, understanding and reflecting end-users' human values (e.g., transparency, privacy, social recognition etc.) in apps has become increasingly important. Violations of end users' values by software applications have been reported in the media and have resulted in a wide range of difficulties for end users. Value violations may bring more and lasting problems for marginalized and vulnerable groups of end-users. This research aims to understand the extent to which the values of Bangladeshi female farmers, marginalized and vulnerable end-users, who are less studied by the software engineering community, are reflected in agriculture apps in Bangladesh. Further to this, we aim to identify possible strategies to embed their values in those apps. To this end, we conducted a mixed-methods empirical study consisting of 13 interviews with app practitioners and four focus groups with 20 Bangladeshi female farmers. The accumulated results from the interviews and focus groups identified 22 values of Bangladeshi female farmers, which the participants expect to be reflected in the agriculture apps. Among these 22 values, 15 values (e.g., accuracy, independence) are already reflected and 7 values (e.g., accessibility, pleasure) are ignored/violated in the existing agriculture apps. We also identified 14 strategies (e.g., "applying human-centered approaches to elicit values", "establishing a dedicated team/person for values concerns") to address Bangladeshi female farmers' values in agriculture apps.

SEOct 8, 2021

A Decision Model for Selecting Patterns and Strategies to Decompose Applications into Microservices

Muhammad Waseem, Peng Liang, Gastón Márquez et al.

Microservices Architecture (MSA) style is a promising design approach to develop software applications consisting of multiple small and independently deployable services. Over the past few years, researchers and practitioners have proposed many MSA patterns and strategies covering various aspects of microservices design, such as application decomposition. However, selecting appropriate patterns and strategies can entail various challenges for practitioners. To this end, this study proposes a decision model for selecting patterns and strategies to decompose applications into microservices. We used peer-reviewed and grey literature to collect the patterns, strategies, and quality attributes for creating this decision model.

SEOct 5, 2021

Does Domain Change the Opinion of Individuals on Human Values? A Preliminary Investigation on eHealth Apps End-users

Humphrey Obie, Mojtaba Shahin, John Grundy et al.

The elicitation of end-users' human values - such as freedom, honesty, transparency, etc. - is important in the development of software systems. We carried out two preliminary Q-studies to understand (a) the general human value opinion types of eHealth applications (apps) end-users (b) the eHealth domain human value opinion types of eHealth apps end-users (c) whether there are differences between the general and eHealth domain opinion types. Our early results show three value opinion types using generic value instruments: (1) fun-loving, success-driven and independent end-user, (2) security-conscious, socially-concerned, and success-driven end-user, and (3) benevolent, success-driven, and conformist end-user Our results also show two value opinion types using domain-specific value instruments: (1) security-conscious, reputable, and honest end-user, and (2) success-driven, reputable and pain-avoiding end-user. Given these results, consideration should be given to domain context in the design and application of values elicitation instruments.

SEOct 2, 2021

How Secondary School Girls Perceive Computational Thinking Practices through Collaborative Programming with the Micro:bit

Mojtaba Shahin, Chris Gonsalvez, Jon Whittle et al.

Computational Thinking (CT) has been investigated from different perspectives. This research aims to investigate how secondary school girls perceive CT practices -- the problem-solving practices that students apply while they are engaged in programming -- when using the micro:bit device in a collaborative setting. This study also explores the collaborative programming process of secondary school girls with the micro:bit device. We conducted mixed-methods research with 203 secondary school girls (in the state of Victoria, Australia) and 31 mentors attending a girls-only CT program (OzGirlsCT program). The girls were grouped into 52 teams and collaboratively developed computational solutions around realistic, important problems to them and their communities. We distributed two surveys (with 193 responses each) to the girls. Further, we surveyed the mentors (with 31 responses) who monitored the girls, and collected their observation reports on their teams. Our study indicates that the girls found "debugging" the most difficult type of CT practice to apply, while collaborative practices of CT were the easiest. We found that prior coding experience significantly reduced the difficulty level of only one CT practice - "debugging". Our study also identified six challenges the girls faced and six best practices they adopted when working on their computational solutions.

SEAug 15, 2021

A Qualitative Study of Architectural Design Issues in DevOps

Mojtaba Shahin, Ali Rezaei Nasab, Muhammad Ali Babar

Software architecture is critical in succeeding with DevOps. However, designing software architectures that enable and support DevOps (DevOps-driven software architectures) is a challenge for organizations. We assert that one of the essential steps towards characterizing DevOps-driven architectures is to understand architectural design issues raised in DevOps. At the same time, some of the architectural issues that emerge in the DevOps context (and their corresponding architectural practices or tactics) may stem from the context (i.e., domain) and characteristics of software organizations. To this end, we conducted a mixed-methods study that consists of a qualitative case study of two teams in a company during their DevOps transformation and a content analysis of Stack Overflow and DevOps Stack Exchange posts to understand architectural design issues in DevOps. Our study found eight specific and contextual architectural design issues faced by the two teams and classified architectural design issues discussed in Stack Overflow and DevOps Stack Exchange into 11 groups. Our aggregated results reveal that the main characteristics of DevOps-driven architectures are: being loosely coupled and prioritizing deployability, testability, supportability, and modifiability over other quality attributes. Finally, we discuss some concrete implications for research and practice.

SEAug 12, 2021

Operationalizing Human Values in Software Engineering: A Survey

Mojtaba Shahin, Waqar Hussain, Arif Nurwidyantoro et al.

Human values (e.g., pleasure, privacy, and social justice) are what a person or a society considers important. The inability to address them in software-intensive systems can result in numerous undesired consequences (e.g., financial losses) for individuals and communities. Various solutions (e.g., methodologies, techniques) are developed to help "operationalize values in software". The ultimate goal is to ensure building software (better) reflects and respects human values. In this survey, "operationalizing values" is referred to as the process of identifying human values and translating them to accessible and concrete concepts so that they can be implemented, validated, verified, and measured in software. This paper provides a deep understanding of the research landscape on operationalizing values in software engineering, covering 51 primary studies. It also presents an analysis and taxonomy of 51 solutions for operationalizing values in software engineering. Our survey reveals that most solutions attempt to help operationalize values in the early phases (requirements and design) of the software development life cycle. However, the later phases (implementation and testing) and other aspects of software development (e.g., "team organization") still need adequate consideration. We outline implications for research and practice and identify open issues and future research directions to advance this area.

SEAug 7, 2021

Design, Monitoring, and Testing of Microservices Systems: The Practitioners' Perspective

Muhammad Waseem, Peng Liang, Mojtaba Shahin et al.

Context: Microservices Architecture (MSA) has received significant attention in the software industry. However, little empirical evidence exists on design, monitoring, and testing of microservices systems. Objective: This research aims to gain a deep understanding of how microservices systems are designed, monitored, and tested in the industry. Method: A mixed-methods study was conducted with 106 survey responses and 6 interviews from microservices practitioners. Results: The main findings are: (1) a combination of domain-driven design and business capability is the most used strategy to decompose an application into microservices, (2) over half of the participants used architecture evaluation and architecture implementation when designing microservices systems, (3) API gateway and Backend for frontend patterns are the most used MSA patterns, (4) resource usage and load balancing as monitoring metrics, log management and exception tracking as monitoring practices are widely used, (5) unit and end-to-end testing are the most used testing strategies, and (6) the complexity of microservices systems poses challenges for their design, monitoring, and testing, for which there are no dedicated solutions. Conclusions: Our findings reveal that more research is needed to (1) deal with microservices complexity at the design level, (2) handle security in microservices systems, and (3) address the monitoring and testing challenges through dedicated solutions.

SEJul 23, 2021

Towards a Human Values Dashboard for Software Development: An Exploratory Study

Arif Nurwidyantoro, Mojtaba Shahin, Michel Chaudron et al.

Background: There is a growing awareness of the importance of human values (e.g., inclusiveness, privacy) in software systems. However, there are no practical tools to support the integration of human values during software development. We argue that a tool that can identify human values from software development artefacts and present them to varying software development roles can (partially) address this gap. We refer to such a tool as human values dashboard. Further to this, our understanding of such a tool is limited. Aims: This study aims to (1) investigate the possibility of using a human values dashboard to help address human values during software development, (2) identify possible benefits of using a human values dashboard, and (3) elicit practitioners' needs from a human values dashboard. Method: We conducted an exploratory study by interviewing 15 software practitioners. A dashboard prototype was developed to support the interview process. We applied thematic analysis to analyse the collected data. Results: Our study finds that a human values dashboard would be useful for the development team (e.g., project manager, developer, tester). Our participants acknowledge that development artefacts, especially requirements documents and issue discussions, are the most suitable source for identifying values for the dashboard. Our study also yields a set of high-level user requirements for a human values dashboard (e.g., it shall allow determining values priority of a project). Conclusions: Our study suggests that a values dashboard is potentially used to raise awareness of values and support values-based decision-making in software development. Future work will focus on addressing the requirements and using issue discussions as potential artefacts for the dashboard.

SEJul 21, 2021

Automated Identification of Security Discussions in Microservices Systems: Industrial Surveys and Experiments

Ali Rezaei Nasab, Mojtaba Shahin, Peng Liang et al.

Lack of awareness and knowledge of microservices-specific security challenges and solutions often leads to ill-informed security decisions in microservices system development. We claim that identifying and leveraging security discussions scattered in existing microservices systems can partially close this gap. We define security discussion as "a paragraph from developer discussions that includes design decisions, challenges, or solutions relating to security". We first surveyed 67 practitioners and found that securing microservices systems is a unique challenge and that having access to security discussions is useful for making security decisions. The survey also confirms the usefulness of potential tools that can automatically identify such security discussions. We developed fifteen machine/deep learning models to automatically identify security discussions. We applied these models on a manually constructed dataset consisting of 4,813 security discussions and 12,464 non-security discussions. We found that all the models can effectively identify security discussions: an average precision of 84.86%, recall of 72.80%, F1-score of 77.89%, AUC of 83.75% and G-mean 82.77%. DeepM1, a deep learning model, performs the best, achieving above 84% in all metrics and significantly outperforms three baselines. Finally, the practitioners' feedback collected from a validation survey reveals that security discussions identified by DeepM1 have promising applications in practice.

SEJul 15, 2021

Characteristics and Challenges of Low-Code Development: The Practitioners' Perspective

Yajing Luo, Peng Liang, Chong Wang et al.

Background: In recent years, Low-code development (LCD) is growing rapidly, and Gartner and Forrester have predicted that the use of LCD is very promising. Giant companies, such as Microsoft, Mendix, and Outsystems have also launched their LCD platforms. Aim: In this work, we explored two popular online developer communities, Stack Overflow (SO) and Reddit, to provide insights on the characteristics and challenges of LCD from a practitioners' perspective. Method: We used two LCD related terms to search the relevant posts in SO and extracted 73 posts. Meanwhile, we explored three LCD related subreddits from Reddit and collected 228 posts. We extracted data from these posts and applied the Constant Comparison method to analyze the descriptions, benefits, and limitations and challenges of LCD. For platforms and programming languages used in LCD, implementation units in LCD, supporting technologies of LCD, types of applications developed by LCD, and domains that use LCD, we used descriptive statistics to analyze and present the results. Results: Our findings show that: (1) LCD may provide a graphical user interface for users to drag and drop with little or even no code; (2) the equipment of out-of-the-box units (e.g., APIs and components) in LCD platforms makes them easy to learn and use as well as speeds up the development; (3) LCD is particularly favored in the domains that have the need for automated processes and workflows; and (4) practitioners have conflicting views on the advantages and disadvantages of LCD. Conclusions: Our findings suggest that researchers should clearly define the terms when they refer to LCD, and developers should consider whether the characteristics of LCD are appropriate for their projects.

SEFeb 24, 2021

How Can Human Values Be Addressed in Agile Methods? A Case Study on SAFe

Waqar Hussain, Mojtaba Shahin, Rashina Hoda et al.

Agile methods are predominantly focused on delivering business values. But can Agile methods be adapted to effectively address and deliver human values such as social justice, privacy, and sustainability in the software they produce? Human values are what an individual or a society considers important in life. Ignoring these human values in software can pose difficulties or risks for all stakeholders (e.g., user dissatisfaction, reputation damage, financial loss). To answer this question, we selected the Scaled Agile Framework (SAFe), one of the most commonly used Agile methods in the industry, and conducted a qualitative case study to identify possible intervention points within SAFe that are the most natural to address and integrate human values in software. We present five high-level empirically-justified sets of interventions in SAFe: artefacts, roles, ceremonies, practices, and culture. We elaborate how some current Agile artefacts (e.g., user story), roles (e.g., product owner), ceremonies (e.g., stand-up meeting), and practices (e.g., business-facing testing) in SAFe can be modified to support the inclusion of human values in software. Further, our study suggests new and exclusive values-based artefacts (e.g., legislative requirement), ceremonies (e.g., values conversation), roles (e.g., values champion), and cultural practices (e.g., induction and hiring) to be introduced in SAFe for this purpose. Guided by our findings, we argue that existing Agile methods can account for human values in software delivery with some evolutionary adaptations.

SEDec 18, 2020

A First Look at Human Values-Violation in App Reviews

Humphrey O. Obie, Waqar Hussain, Xin Xia et al.

Ubiquitous technologies such as mobile software applications (mobile apps) have a tremendous influence on the evolution of the social, cultural, economic, and political facets of life in society. Mobile apps fulfil many practical purposes for users including entertainment, transportation, financial management, etc. Given the ubiquity of mobile apps in the lives of individuals and the consequent effect of these technologies on society, it is essential to consider the relationship between human values and the development and deployment of mobile apps. The many negative consequences of violating human values such as privacy, fairness or social justice by technology have been documented in recent times. If we can detect these violations in a timely manner, developers can look to better address them. To understand the violation of human values in a range of common mobile apps, we analysed 22,119 app reviews from Google Play Store using natural language processing techniques. We base our values violation detection approach on a widely accepted model of human values; the Schwartz theory of basic human values. The results of our analysis show that 26.5% of the reviews contained text indicating user perceived violations of human values. We found that benevolence and self-direction were the most violated value categories, and conformity and tradition were the least violated categories. Our results also highlight the need for a proactive approach to the alignment of values amongst stakeholders and the use of app reviews as a valuable additional source for mining values requirements.

SEAug 18, 2020

A Systematic Mapping Study on Microservices Architecture in DevOps

Muhammad Waseem, Peng Liang, Mojtaba Shahin

Context: Applying Microservices Architecture (MSA) in DevOps has received significant attention in recent years. However, there exists no comprehensive review of the state of research on this topic. Objective: This work aims to systematically identify, analyze, and classify the literature on MSA in DevOps. Method: A Systematic Mapping Study (SMS) has been conducted on the literature published between January 2009 and July 2018. Results: Forty-seven studies were finally selected and the key results are: (1) Three themes on the research on MSA in DevOps are "microservices development and operations in DevOps", "approaches and tool support for MSA based systems in DevOps", and "MSA migration experiences in DevOps". (2) 24 problems with their solutions regarding implementing MSA in DevOps are identified. (3) MSA is mainly described by using boxes and lines. (4) Most of the quality attributes are positively affected when employing MSA in DevOps. (5) 50 tools that support building MSA based systems in DevOps are collected. (6) The combination of MSA and DevOps has been applied in a wide range of application domains. Conclusions: The results and findings will benefit researchers and practitioners to conduct further research and bring more dedicated solutions for the issues of MSA in DevOps.

SEMay 16, 2020

Architectural Design Space for Modelling and Simulation as a Service: A Review

Mojtaba Shahin, M. Ali Babar, Muhammad Aufeef Chauhan

Modelling and Simulation as a Service (MSaaS) is a promising approach to deploy and execute Modelling and Simulation (M&S) applications quickly and on-demand. An appropriate software architecture is essential to deliver quality M&S applications following the MSaaS concept to a wide range of users. This study aims to characterize the state-of-the-art MSaaS architectures by conducting a systematic review of 31 papers published from 2010 to 2018. Our findings reveal that MSaaS applications are mainly designed using layered architecture style, followed by service-oriented architecture, component-based architecture, and pluggable component-based architecture. We also found that interoperability and deployability have the greatest importance in the architecture of MSaaS applications. In addition, our study indicates that the current MSaaS architectures do not meet the critical user requirements of modern M&S applications appropriately. Based on our results, we recommend that there is a need for more effort and research to (1) design the user interfaces that enable users to build and configure simulation models with minimum effort and limited domain knowledge, (2) provide mechanisms to improve the deployability of M&S applications, and (3) gain a deep insight into how M&S applications should be architected to respond to the emerging user requirements in the military domain.

SEMar 13, 2020

On the Role of Software Architecture in DevOps Transformation: An Industrial Case Study

Mojtaba Shahin, M. Ali Babar

Development and Operations (DevOps), a particular type of Continuous Software Engineering, has become a popular Software System Engineering paradigm. Software architecture is critical in succeeding with DevOps. However, there is little evidence-based knowledge of how software systems are architected in the industry to enable and support DevOps. Since architectural decisions, along with their rationales and implications, are very important in the architecting process, we performed an industrial case study that has empirically identified and synthesized the key architectural decisions considered essential to DevOps transformation by two software development teams. Our study also reveals that apart from the chosen architecture style, DevOps works best with modular architectures. In addition, we found that the performance of the studied teams can improve in DevOps if operations specialists are added to the teams to perform the operations tasks that require advanced expertise. Finally, investment in testing is inevitable for the teams if they want to release software changes faster.

SEAug 27, 2018

An Empirical Study of Architecting for Continuous Delivery and Deployment

Mojtaba Shahin, Mansooreh Zahedi, Muhammad Ali Babar et al.

Recently, many software organizations have been adopting Continuous Delivery and Continuous Deployment (CD) practices to develop and deliver quality software more frequently and reliably. Whilst an increasing amount of the literature covers different aspects of CD, little is known about the role of software architecture in CD and how an application should be (re-) architected to enable and support CD. We have conducted a mixed-methods empirical study that collected data through in-depth, semi-structured interviews with 21 industrial practitioners from 19 organizations, and a survey of 91 professional software practitioners. Based on a systematic and rigorous analysis of the gathered qualitative and quantitative data, we present a conceptual framework to support the process of (re-) architecting for CD. We provide evidence-based insights about practicing CD within monolithic systems and characterize the principle of "small and independent deployment units" as an alternative to the monoliths. Our framework supplements the architecting process in a CD context through introducing the quality attributes (e.g., resilience) that require more attention and demonstrating the strategies (e.g., prioritizing operations concerns) to design operations-friendly architectures. We discuss the key insights (e.g., monoliths and CD are not intrinsically oxymoronic) gained from our study and draw implications for research and practice.

SEMar 21, 2017

Continuous Integration, Delivery and Deployment: A Systematic Review on Approaches, Tools, Challenges and Practices

Mojtaba Shahin, Muhammad Ali Babar, Liming Zhu

Context: Continuous practices, i.e., continuous integration, delivery, and deployment, are the software development industry practices that enable organizations to frequently and reliably release new features and products. With the increasing interest in and literature on continuous practices, it is important to systematically review and synthesize the approaches, tools, challenges, and practices reported for adopting and implementing continuous practices. Objective: This research aimed at systematically reviewing the state of the art of continuous practices to classify approaches and tools, identify challenges and practices in this regard, and identify the gaps for future research. Method: We used systematic literature review (SLR) method for reviewing the peer-reviewed papers on continuous practices published between 2004 and 1st June 2016. We applied thematic analysis method for analysing the data extracted from reviewing 69 papers selected using predefined criteria. Results: We have identified thirty approaches and associated tools, which facilitate the implementation of continuous practices in the following ways: (1) "reducing build and test time in continuous integration (CI)"; (2) "increasing visibility and awareness on build and test results in CI"; (3) "supporting (semi-) automated continuous testing"; (4) "detecting violations, flaws and faults in CI"; (5) "addressing security and scalability issues in deployment pipeline", and (6) "improving dependability and reliability of deployment process". We have also determined a list of critical factors such as "testing (effort and time)", "team awareness and transparency", "good design principles", "customer", "highly skilled and motivated team", "application domain", and "appropriate infrastructure" that should be carefully considered when introducing continuous practices in a given organization.

SEMar 13, 2017

Security Support in Continuous Deployment Pipeline

Faheem Ullah, Adam Johannes Raft, Mojtaba Shahin et al.

Continuous Deployment (CD) has emerged as a new practice in the software industry to continuously and automatically deploy software changes into production. Continuous Deployment Pipeline (CDP) supports CD practice by transferring the changes from the repository to production. Since most of the CDP components run in an environment that has several interfaces to the Internet, these components are vulnerable to various kinds of malicious attacks. This paper reports our work aimed at designing secure CDP by utilizing security tactics. We have demonstrated the effectiveness of five security tactics in designing a secure pipeline by conducting an experiment on two CDPs - one incorporates security tactics while the other does not. Both CDPs have been analyzed qualitatively and quantitatively. We used assurance cases with goal-structured notations for qualitative analysis. For quantitative analysis, we used penetration tools. Our findings indicate that the applied tactics improve the security of the major components (i.e., repository, continuous integration server, main server) of a CDP by controlling access to the components and establishing secure connections.