SEMar 9, 2022
Towards a Roadmap on Software Engineering for Responsible AIQinghua Lu, Liming Zhu, Xiwei Xu et al.
Although AI is transforming the world, there are serious concerns about its ability to behave and make decisions responsibly. Many ethical regulations, principles, and frameworks for responsible AI have been issued recently. However, they are high level and difficult to put into practice. On the other hand, most AI researchers focus on algorithmic solutions, while the responsible AI challenges actually crosscut the entire engineering lifecycle and components of AI systems. To close the gap in operationalizing responsible AI, this paper aims to develop a roadmap on software engineering for responsible AI. The roadmap focuses on (i) establishing multi-level governance for responsible AI systems, (ii) setting up the development processes incorporating process-oriented practices for responsible AI systems, and (iii) building responsible-AI-by-design into AI systems through system-level architectural style, patterns and techniques.
AISep 12, 2022
Responsible AI Pattern Catalogue: A Collection of Best Practices for AI Governance and EngineeringQinghua Lu, Liming Zhu, Xiwei Xu et al.
Responsible AI is widely considered as one of the greatest scientific challenges of our time and is key to increase the adoption of AI. Recently, a number of AI ethics principles frameworks have been published. However, without further guidance on best practices, practitioners are left with nothing much beyond truisms. Also, significant efforts have been placed at algorithm-level rather than system-level, mainly focusing on a subset of mathematics-amenable ethical principles, such as fairness. Nevertheless, ethical issues can arise at any step of the development lifecycle, cutting across many AI and non-AI components of systems beyond AI algorithms and models. To operationalize responsible AI from a system perspective, in this paper, we present a Responsible AI Pattern Catalogue based on the results of a Multivocal Literature Review (MLR). Rather than staying at the principle or algorithm level, we focus on patterns that AI system stakeholders can undertake in practice to ensure that the developed AI systems are responsible throughout the entire governance and engineering lifecycle. The Responsible AI Pattern Catalogue classifies the patterns into three groups: multi-level governance patterns, trustworthy process patterns, and responsible-AI-by-design product patterns. These patterns provide systematic and actionable guidance for stakeholders to implement responsible AI.
AIMar 2, 2022
Responsible-AI-by-Design: a Pattern Collection for Designing Responsible AI SystemsQinghua Lu, Liming Zhu, Xiwei Xu et al.
Although AI has significant potential to transform society, there are serious concerns about its ability to behave and make decisions responsibly. Many ethical regulations, principles, and guidelines for responsible AI have been issued recently. However, these principles are high-level and difficult to put into practice. In the meantime much effort has been put into responsible AI from the algorithm perspective, but they are limited to a small subset of ethical principles amenable to mathematical analysis. Responsible AI issues go beyond data and algorithms and are often at the system-level crosscutting many system components and the entire software engineering lifecycle. Based on the result of a systematic literature review, this paper identifies one missing element as the system-level guidance - how to design the architecture of responsible AI systems. We present a summary of design patterns that can be embedded into the AI systems as product features to contribute to responsible-AI-by-design.
SEJun 23, 2023
Exploring Qualitative Research Using LLMsMuneera Bano, Didar Zowghi, Jon Whittle
The advent of AI driven large language models (LLMs) have stirred discussions about their role in qualitative research. Some view these as tools to enrich human understanding, while others perceive them as threats to the core values of the discipline. This study aimed to compare and contrast the comprehension capabilities of humans and LLMs. We conducted an experiment with small sample of Alexa app reviews, initially classified by a human analyst. LLMs were then asked to classify these reviews and provide the reasoning behind each classification. We compared the results with human classification and reasoning. The research indicated a significant alignment between human and ChatGPT 3.5 classifications in one third of cases, and a slightly lower alignment with GPT4 in over a quarter of cases. The two AI models showed a higher alignment, observed in more than half of the instances. However, a consensus across all three methods was seen only in about one fifth of the classifications. In the comparison of human and LLMs reasoning, it appears that human analysts lean heavily on their individual experiences. As expected, LLMs, on the other hand, base their reasoning on the specific word choices found in app reviews and the functional components of the app itself. Our results highlight the potential for effective human LLM collaboration, suggesting a synergistic rather than competitive relationship. Researchers must continuously evaluate LLMs role in their work, thereby fostering a future where AI and humans jointly enrich qualitative research.
AINov 22, 2023
Towards Responsible Generative AI: A Reference Architecture for Designing Foundation Model based AgentsQinghua Lu, Liming Zhu, Xiwei Xu et al.
Foundation models, such as large language models (LLMs), have been widely recognised as transformative AI technologies due to their capabilities to understand and generate content, including plans with reasoning capabilities. Foundation model based agents derive their autonomy from the capabilities of foundation models, which enable them to autonomously break down a given goal into a set of manageable tasks and orchestrate task execution to meet the goal. Despite the huge efforts put into building foundation model based agents, the architecture design of the agents has not yet been systematically explored. Also, while there are significant benefits of using agents for planning and execution, there are serious considerations regarding responsible AI related software quality attributes, such as security and accountability. Therefore, this paper presents a pattern-oriented reference architecture that serves as guidance when designing foundation model based agents. We evaluate the completeness and utility of the proposed reference architecture by mapping it to the architecture of two real-world agents.
CLApr 13, 2023
A Reference Architecture for Designing Foundation Model based SystemsQinghua Lu, Liming Zhu, Xiwei Xu et al.
The release of ChatGPT, Gemini, and other large language model has drawn huge interests on foundations models. There is a broad consensus that foundations models will be the fundamental building blocks for future AI systems. However, there is a lack of systematic guidance on the architecture design. Particularly, the the rapidly growing capabilities of foundations models can eventually absorb other components of AI systems, posing challenges of moving boundary and interface evolution in architecture design. Furthermore, incorporating foundations models into AI systems raises significant concerns about responsible and safe AI due to their opaque nature and rapidly advancing intelligence. To address these challenges, the paper first presents an architecture evolution of AI systems in the era of foundation models, transitioning from "foundation-model-as-a-connector" to "foundation-model-as-a-monolithic architecture". The paper then identifies key design decisions and proposes a pattern-oriented reference architecture for designing responsible foundation-model-based systems. The patterns can enable the potential of foundation models while ensuring associated risks.
CYAug 2, 2024
Responsible AI Question Bank: A Comprehensive Tool for AI Risk AssessmentSung Une Lee, Harsha Perera, Yue Liu et al.
The rapid growth of Artificial Intelligence (AI) has underscored the urgent need for responsible AI practices. Despite increasing interest, a comprehensive AI risk assessment toolkit remains lacking. This study introduces our Responsible AI (RAI) Question Bank, a comprehensive framework and tool designed to support diverse AI initiatives. By integrating AI ethics principles such as fairness, transparency, and accountability into a structured question format, the RAI Question Bank aids in identifying potential risks, aligning with emerging regulations like the EU AI Act, and enhancing overall AI governance. A key benefit of the RAI Question Bank is its systematic approach to linking lower-level risk questions to higher-level ones and related themes, preventing siloed assessments and ensuring a cohesive evaluation process. Case studies illustrate the practical application of the RAI Question Bank in assessing AI projects, from evaluating risk factors to informing decision-making processes. The study also demonstrates how the RAI Question Bank can be used to ensure compliance with standards, mitigate risks, and promote the development of trustworthy AI systems. This work advances RAI by providing organizations with a valuable tool to navigate the complexities of ethical AI development and deployment while ensuring comprehensive risk management.
SEJan 3, 2023
Developing Responsible Chatbots for Financial Services: A Pattern-Oriented Responsible AI Engineering ApproachQinghua Lu, Yuxiu Luo, Liming Zhu et al.
The recent release of ChatGPT has gained huge attention and discussion worldwide, with responsible AI being a key topic of discussion. How can we ensure that AI systems, including ChatGPT, are developed and adopted in a responsible way? To tackle the responsible AI challenges, various ethical principles have been released by governments, organisations, and companies. However, those principles are very abstract and not practical enough. Further, significant efforts have been put on algorithm-level solutions that only address a narrow set of principles, such as fairness and privacy. To fill the gap, we adopt a pattern-oriented responsible AI engineering approach and build a Responsible AI Pattern Catalogue to operationalise responsible AI from a system perspective. In this article, we first summarise the major challenges in operationalising responsible AI at scale and introduce how we use the Responsible AI Pattern Catalogue to address those challenges. We then examine the risks at each stage of the chatbot development process and recommend pattern-driven mitigations to evaluate the the usefulness of the Responsible AI Pattern Catalogue in a real-world setting.
AIMay 16, 2024
Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based AgentsYue Liu, Sin Kit Lo, Qinghua Lu et al.
Foundation model-enabled generative artificial intelligence facilitates the development and implementation of agents, which can leverage distinguished reasoning and language processing capabilities to takes a proactive, autonomous role to pursue users' goals. Nevertheless, there is a lack of systematic knowledge to guide practitioners in designing the agents considering challenges of goal-seeking (including generating instrumental goals and plans), such as hallucinations inherent in foundation models, explainability of reasoning process, complex accountability, etc. To address this issue, we have performed a systematic literature review to understand the state-of-the-art foundation model-based agents and the broader ecosystem. In this paper, we present a pattern catalogue consisting of 18 architectural patterns with analyses of the context, forces, and trade-offs as the outcomes from the previous literature review. We propose a decision model for selecting the patterns. The proposed catalogue can provide holistic guidance for the effective use of patterns, and support the architecture design of foundation model-based agents by facilitating goal-seeking and plan generation.
CYMar 22, 2025
A Qualitative Study of User Perception of M365 AI CopilotMuneera Bano, Didar Zowghi, Jon Whittle et al.
Adopting AI copilots in professional workflows presents opportunities for enhanced productivity, efficiency, and decision making. In this paper, we present results from a six month trial of M365 Copilot conducted at our organisation in 2024. A qualitative interview study was carried out with 27 participants. The study explored user perceptions of M365 Copilot's effectiveness, productivity impact, evolving expectations, ethical concerns, and overall satisfaction. Initial enthusiasm for the tool was met with mixed post trial experiences. While some users found M365 Copilot beneficial for tasks such as email coaching, meeting summaries, and content retrieval, others reported unmet expectations in areas requiring deeper contextual understanding, reasoning, and integration with existing workflows. Ethical concerns were a recurring theme, with users highlighting issues related to data privacy, transparency, and AI bias. While M365 Copilot demonstrated value in specific operational areas, its broader impact remained constrained by usability limitations and the need for human oversight to validate AI generated outputs.
SEMay 9, 2023
A Taxonomy of Foundation Model based Systems through the Lens of Software ArchitectureQinghua Lu, Liming Zhu, Xiwei Xu et al.
The recent release of large language model (LLM) based chatbots, such as ChatGPT, has attracted huge interest in foundation models. It is widely believed that foundation models will serve as the fundamental building blocks for future AI systems. As foundation models are in their early stages, the design of foundation model based systems has not yet been systematically explored. There is limited understanding about the impact of introducing foundation models in software architecture. Therefore, in this paper, we propose a taxonomy of foundation model based systems, which classifies and compares the characteristics of foundation models and design options of foundation model based systems. Our taxonomy comprises three categories: the pretraining and adaptation of foundation models, the architecture design of foundation model based systems, and responsible-AI-by-design. This taxonomy can serve as concrete guidance for making major architectural design decisions when designing foundation model based systems and highlights trade-offs arising from design decisions.
CYDec 14, 2021
AI Ethics Principles in Practice: Perspectives of Designers and DevelopersConrad Sanderson, David Douglas, Qinghua Lu et al.
As consensus across the various published AI ethics principles is approached, a gap remains between high-level principles and practical techniques that can be readily adopted to design and develop responsible AI systems. We examine the practices and experiences of researchers and engineers from Australia's national scientific research agency (CSIRO), who are involved in designing and developing AI systems for many application areas. Semi-structured interviews were used to examine how the practices of the participants relate to and align with a set of high-level AI ethics principles proposed by the Australian Government. The principles comprise: (1) privacy protection and security, (2) reliability and safety, (3) transparency and explainability, (4) fairness, (5) contestability, (6) accountability, (7) human-centred values, (8) human, social and environmental wellbeing. Discussions on the gained insights from the interviews include various tensions and trade-offs between the principles, and provide suggestions for implementing each high-level principle. We also present suggestions aiming to enhance associated support mechanisms.
SENov 30, 2021
The Impact of Considering Human Values during Requirements Engineering ActivitiesHarsha Perera, Rashina Hoda, Rifat Ara Shams et al.
Human values, or what people hold important in their life, such as freedom, fairness, and social responsibility, often remain unnoticed and unattended during software development. Ignoring values can lead to values violations in software that can result in financial losses, reputation damage, and widespread social and legal implications. However, embedding human values in software is not only non-trivial but also generally an unclear process. Commencing as early as during the Requirements Engineering (RE) activities promises to ensure fit-for-purpose and quality software products that adhere to human values. But what is the impact of considering human values explicitly during early RE activities? To answer this question, we conducted a scenario-based survey where 56 software practitioners contextualised requirements analysis towards a proposed mobile application for the homeless and suggested values-laden software features accordingly. The suggested features were qualitatively analysed. Results show that explicit considerations of values can help practitioners identify applicable values, associate purpose with the features they develop, think outside-the-box, and build connections between software features and human values. Finally, drawing from the results and experiences of this study, we propose a scenario-based values elicitation process -- a simple four-step takeaway as a practical implication of this study.
AINov 18, 2021
Software Engineering for Responsible AI: An Empirical Study and Operationalised PatternsQinghua Lu, Liming Zhu, Xiwei Xu et al.
Although artificial intelligence (AI) is solving real-world challenges and transforming industries, there are serious concerns about its ability to behave and make decisions in a responsible way. Many AI ethics principles and guidelines for responsible AI have been recently issued by governments, organisations, and enterprises. However, these AI ethics principles and guidelines are typically high-level and do not provide concrete guidance on how to design and develop responsible AI systems. To address this shortcoming, we first present an empirical study where we interviewed 21 scientists and engineers to understand the practitioners' perceptions on AI ethics principles and their implementation. We then propose a template that enables AI ethics principles to be operationalised in the form of concrete patterns and suggest a list of patterns using the newly created template. These patterns provide concrete, operationalised guidance that facilitate the development of responsible AI systems.
SEOct 11, 2021
Human Values in Mobile App Development: An Empirical Study on Bangladeshi Agriculture Mobile AppsRifat Ara Shams, Mojtaba Shahin, Gillian Oliver et al.
Given the ubiquity of mobile applications (apps) in daily lives, understanding and reflecting end-users' human values (e.g., transparency, privacy, social recognition etc.) in apps has become increasingly important. Violations of end users' values by software applications have been reported in the media and have resulted in a wide range of difficulties for end users. Value violations may bring more and lasting problems for marginalized and vulnerable groups of end-users. This research aims to understand the extent to which the values of Bangladeshi female farmers, marginalized and vulnerable end-users, who are less studied by the software engineering community, are reflected in agriculture apps in Bangladesh. Further to this, we aim to identify possible strategies to embed their values in those apps. To this end, we conducted a mixed-methods empirical study consisting of 13 interviews with app practitioners and four focus groups with 20 Bangladeshi female farmers. The accumulated results from the interviews and focus groups identified 22 values of Bangladeshi female farmers, which the participants expect to be reflected in the agriculture apps. Among these 22 values, 15 values (e.g., accuracy, independence) are already reflected and 7 values (e.g., accessibility, pleasure) are ignored/violated in the existing agriculture apps. We also identified 14 strategies (e.g., "applying human-centered approaches to elicit values", "establishing a dedicated team/person for values concerns") to address Bangladeshi female farmers' values in agriculture apps.
SEOct 5, 2021
Does Domain Change the Opinion of Individuals on Human Values? A Preliminary Investigation on eHealth Apps End-usersHumphrey Obie, Mojtaba Shahin, John Grundy et al.
The elicitation of end-users' human values - such as freedom, honesty, transparency, etc. - is important in the development of software systems. We carried out two preliminary Q-studies to understand (a) the general human value opinion types of eHealth applications (apps) end-users (b) the eHealth domain human value opinion types of eHealth apps end-users (c) whether there are differences between the general and eHealth domain opinion types. Our early results show three value opinion types using generic value instruments: (1) fun-loving, success-driven and independent end-user, (2) security-conscious, socially-concerned, and success-driven end-user, and (3) benevolent, success-driven, and conformist end-user Our results also show two value opinion types using domain-specific value instruments: (1) security-conscious, reputable, and honest end-user, and (2) success-driven, reputable and pain-avoiding end-user. Given these results, consideration should be given to domain context in the design and application of values elicitation instruments.
SEOct 2, 2021
How Secondary School Girls Perceive Computational Thinking Practices through Collaborative Programming with the Micro:bitMojtaba Shahin, Chris Gonsalvez, Jon Whittle et al.
Computational Thinking (CT) has been investigated from different perspectives. This research aims to investigate how secondary school girls perceive CT practices -- the problem-solving practices that students apply while they are engaged in programming -- when using the micro:bit device in a collaborative setting. This study also explores the collaborative programming process of secondary school girls with the micro:bit device. We conducted mixed-methods research with 203 secondary school girls (in the state of Victoria, Australia) and 31 mentors attending a girls-only CT program (OzGirlsCT program). The girls were grouped into 52 teams and collaboratively developed computational solutions around realistic, important problems to them and their communities. We distributed two surveys (with 193 responses each) to the girls. Further, we surveyed the mentors (with 31 responses) who monitored the girls, and collected their observation reports on their teams. Our study indicates that the girls found "debugging" the most difficult type of CT practice to apply, while collaborative practices of CT were the easiest. We found that prior coding experience significantly reduced the difficulty level of only one CT practice - "debugging". Our study also identified six challenges the girls faced and six best practices they adopted when working on their computational solutions.
SEAug 12, 2021
Operationalizing Human Values in Software Engineering: A SurveyMojtaba Shahin, Waqar Hussain, Arif Nurwidyantoro et al.
Human values (e.g., pleasure, privacy, and social justice) are what a person or a society considers important. The inability to address them in software-intensive systems can result in numerous undesired consequences (e.g., financial losses) for individuals and communities. Various solutions (e.g., methodologies, techniques) are developed to help "operationalize values in software". The ultimate goal is to ensure building software (better) reflects and respects human values. In this survey, "operationalizing values" is referred to as the process of identifying human values and translating them to accessible and concrete concepts so that they can be implemented, validated, verified, and measured in software. This paper provides a deep understanding of the research landscape on operationalizing values in software engineering, covering 51 primary studies. It also presents an analysis and taxonomy of 51 solutions for operationalizing values in software engineering. Our survey reveals that most solutions attempt to help operationalize values in the early phases (requirements and design) of the software development life cycle. However, the later phases (implementation and testing) and other aspects of software development (e.g., "team organization") still need adequate consideration. We outline implications for research and practice and identify open issues and future research directions to advance this area.
SEJul 23, 2021
Towards a Human Values Dashboard for Software Development: An Exploratory StudyArif Nurwidyantoro, Mojtaba Shahin, Michel Chaudron et al.
Background: There is a growing awareness of the importance of human values (e.g., inclusiveness, privacy) in software systems. However, there are no practical tools to support the integration of human values during software development. We argue that a tool that can identify human values from software development artefacts and present them to varying software development roles can (partially) address this gap. We refer to such a tool as human values dashboard. Further to this, our understanding of such a tool is limited. Aims: This study aims to (1) investigate the possibility of using a human values dashboard to help address human values during software development, (2) identify possible benefits of using a human values dashboard, and (3) elicit practitioners' needs from a human values dashboard. Method: We conducted an exploratory study by interviewing 15 software practitioners. A dashboard prototype was developed to support the interview process. We applied thematic analysis to analyse the collected data. Results: Our study finds that a human values dashboard would be useful for the development team (e.g., project manager, developer, tester). Our participants acknowledge that development artefacts, especially requirements documents and issue discussions, are the most suitable source for identifying values for the dashboard. Our study also yields a set of high-level user requirements for a human values dashboard (e.g., it shall allow determining values priority of a project). Conclusions: Our study suggests that a values dashboard is potentially used to raise awareness of values and support values-based decision-making in software development. Future work will focus on addressing the requirements and using issue discussions as potential artefacts for the dashboard.
AIMay 19, 2021
AI and Ethics -- Operationalising Responsible AILiming Zhu, Xiwei Xu, Qinghua Lu et al.
In the last few years, AI continues demonstrating its positive impact on society while sometimes with ethically questionable consequences. Building and maintaining public trust in AI has been identified as the key to successful and sustainable innovation. This chapter discusses the challenges related to operationalizing ethical AI principles and presents an integrated view that covers high-level ethical AI principles, the general notion of trust/trustworthiness, and product/process support in the context of responsible AI, which helps improve both trust and trustworthiness of AI for a wider set of stakeholders.
SEFeb 24, 2021
How Can Human Values Be Addressed in Agile Methods? A Case Study on SAFeWaqar Hussain, Mojtaba Shahin, Rashina Hoda et al.
Agile methods are predominantly focused on delivering business values. But can Agile methods be adapted to effectively address and deliver human values such as social justice, privacy, and sustainability in the software they produce? Human values are what an individual or a society considers important in life. Ignoring these human values in software can pose difficulties or risks for all stakeholders (e.g., user dissatisfaction, reputation damage, financial loss). To answer this question, we selected the Scaled Agile Framework (SAFe), one of the most commonly used Agile methods in the industry, and conducted a qualitative case study to identify possible intervention points within SAFe that are the most natural to address and integrate human values in software. We present five high-level empirically-justified sets of interventions in SAFe: artefacts, roles, ceremonies, practices, and culture. We elaborate how some current Agile artefacts (e.g., user story), roles (e.g., product owner), ceremonies (e.g., stand-up meeting), and practices (e.g., business-facing testing) in SAFe can be modified to support the inclusion of human values in software. Further, our study suggests new and exclusive values-based artefacts (e.g., legislative requirement), ceremonies (e.g., values conversation), roles (e.g., values champion), and cultural practices (e.g., induction and hiring) to be introduced in SAFe for this purpose. Guided by our findings, we argue that existing Agile methods can account for human values in software delivery with some evolutionary adaptations.
SEDec 18, 2020
A First Look at Human Values-Violation in App ReviewsHumphrey O. Obie, Waqar Hussain, Xin Xia et al.
Ubiquitous technologies such as mobile software applications (mobile apps) have a tremendous influence on the evolution of the social, cultural, economic, and political facets of life in society. Mobile apps fulfil many practical purposes for users including entertainment, transportation, financial management, etc. Given the ubiquity of mobile apps in the lives of individuals and the consequent effect of these technologies on society, it is essential to consider the relationship between human values and the development and deployment of mobile apps. The many negative consequences of violating human values such as privacy, fairness or social justice by technology have been documented in recent times. If we can detect these violations in a timely manner, developers can look to better address them. To understand the violation of human values in a range of common mobile apps, we analysed 22,119 app reviews from Google Play Store using natural language processing techniques. We base our values violation detection approach on a widely accepted model of human values; the Schwartz theory of basic human values. The results of our analysis show that 26.5% of the reviews contained text indicating user perceived violations of human values. We found that benevolence and self-direction were the most violated value categories, and conformity and tradition were the least violated categories. Our results also highlight the need for a proactive approach to the alignment of values amongst stakeholders and the use of app reviews as a valuable additional source for mining values requirements.
SEJul 18, 2019
A Study on the Prevalence of Human Values in Software Engineering Publications, 2015-2018Harsha Perera, Arif Nurwidyantoro, Waqar Hussain et al.
Failure to account for human values in software (e.g., equality and fairness) can result in user dissatisfaction and negative socio-economic impact. Engineering these values in software, however, requires technical and methodological support throughout the development life cycle. This paper investigates to what extent software engineering (SE) research has considered human values. We investigate the prevalence of human values in recent (2015 - 2018) publications at some of the top-tier SE conferences and journals. We classify SE publications, based on their relevance to different values, against a widely used value structure adopted from social sciences. Our results show that: (a) only a small proportion of the publications directly consider values, classified as relevant publications; (b) for the majority of the values, very few or no relevant publications were found; and (c) the prevalence of the relevant publications was higher in SE conferences compared to SE journals. This paper shares these and other insights that motivate research on human values in software engineering.