Ibrahim Habli

h-index31

14papers

433citations

Novelty33%

AI Score29

Ranked #144,598 of 194,257 authors (top 74%)#31,851 in LG (top 79%)

14 Papers

5.1CYMar 29, 2022

A Principles-based Ethics Assurance Argument Pattern for AI and Autonomous Systems

Zoe Porter, Ibrahim Habli, John McDermid et al.

An assurance case is a structured argument, typically produced by safety engineers, to communicate confidence that a critical or complex system, such as an aircraft, will be acceptably safe within its intended context. Assurance cases often inform third party approval of a system. One emerging proposition within the trustworthy AI and autonomous systems (AI/AS) research community is to use assurance cases to instil justified confidence that specific AI/AS will be ethically acceptable when operational in well-defined contexts. This paper substantially develops the proposition and makes it concrete. It brings together the assurance case methodology with a set of ethical principles to structure a principles-based ethics assurance argument pattern. The principles are justice, beneficence, non-maleficence, and respect for human autonomy, with the principle of transparency playing a supporting role. The argument pattern, shortened to the acronym PRAISE, is described. The objective of the proposed PRAISE argument pattern is to provide a reusable template for individual ethics assurance cases, by which engineers, developers, operators, or regulators could justify, communicate, or challenge a claim about the overall ethical acceptability of the use of a specific AI/AS in a given socio-technical context. We apply the pattern to the hypothetical use case of an autonomous robo-taxi service in a city centre.

7.9AIAug 4, 2023

Unravelling Responsibility for AI

Zoe Porter, Philippa Ryan, Phillip Morgan et al.

It is widely acknowledged that we need to establish where responsibility lies for the outputs and impacts of AI-enabled systems. This is important to achieve justice and compensation for victims of AI harms, and to inform policy and engineering practice. But without a clear, thorough understanding of what "responsibility" means, deliberations about where responsibility lies will be, at best, unfocused and incomplete and, at worst, misguided. Furthermore, AI-enabled systems exist within a wider ecosystem of actors, decisions, and governance structures, giving rise to complex networks of responsibility relations. To address these issues, this paper presents a conceptual framework of responsibility, accompanied with a graphical notation and general methodology for visualising these responsibility networks and for tracing different responsibility attributions for AI. Taking the three-part formulation "Actor A is responsible for Occurrence O," the framework unravels the concept of responsibility to clarify that there are different possibilities of who is responsible for AI, senses in which they are responsible, and aspects of events they are responsible for. The notation allows these permutations to be represented graphically. The methodology enables users to apply the framework to specific scenarios. The aim is to offer a foundation to support stakeholders from diverse disciplinary backgrounds to discuss and address complex responsibility questions in hypothesised and real-world cases involving AI. The work is illustrated by application to a fictitious scenario of a fatal collision between a crewless, AI-enabled maritime vessel in autonomous mode and a traditional, crewed vessel at sea.

5.0ROFeb 20, 2023

AERoS: Assurance of Emergent Behaviour in Autonomous Robotic Swarms

Dhaminda B. Abeywickrama, James Wilson, Suet Lee et al.

The behaviours of a swarm are not explicitly engineered. Instead, they are an emergent consequence of the interactions of individual agents with each other and their environment. This emergent functionality poses a challenge to safety assurance. The main contribution of this paper is a process for the safety assurance of emergent behaviour in autonomous robotic swarms called AERoS, following the guidance on the Assurance of Machine Learning for use in Autonomous Systems (AMLAS). We explore our proposed process using a case study centred on a robot swarm operating a public cloakroom.

1.8LGSep 1, 2022

Review of the AMLAS Methodology for Application in Healthcare

Shakir Laher, Carla Brackstone, Sara Reis et al.

In recent years, the number of machine learning (ML) technologies gaining regulatory approval for healthcare has increased significantly allowing them to be placed on the market. However, the regulatory frameworks applied to them were originally devised for traditional software, which has largely rule-based behaviour, compared to the data-driven and learnt behaviour of ML. As the frameworks are in the process of reformation, there is a need to proactively assure the safety of ML to prevent patient safety being compromised. The Assurance of Machine Learning for use in Autonomous Systems (AMLAS) methodology was developed by the Assuring Autonomy International Programme based on well-established concepts in system safety. This review has appraised the methodology by consulting ML manufacturers to understand if it converges or diverges from their current safety assurance practices, whether there are gaps and limitations in its structure and if it is fit for purpose when applied to the healthcare domain. Through this work we offer the view that there is clear utility for AMLAS as a safety assurance methodology when applied to healthcare machine learning technologies, although development of healthcare specific supplementary guidance would benefit those implementing the methodology.

2.3CYDec 30, 2023

What's my role? Modelling responsibility for AI-based safety-critical systems

Philippa Ryan, Zoe Porter, Joanna Al-Qaddoumi et al.

AI-Based Safety-Critical Systems (AI-SCS) are being increasingly deployed in the real world. These can pose a risk of harm to people and the environment. Reducing that risk is an overarching priority during development and operation. As more AI-SCS become autonomous, a layer of risk management via human intervention has been removed. Following an accident it will be important to identify causal contributions and the different responsible actors behind those to learn from mistakes and prevent similar future events. Many authors have commented on the "responsibility gap" where it is difficult for developers and manufacturers to be held responsible for harmful behaviour of an AI-SCS. This is due to the complex development cycle for AI, uncertainty in AI performance, and dynamic operating environment. A human operator can become a "liability sink" absorbing blame for the consequences of AI-SCS outputs they weren't responsible for creating, and may not have understanding of. This cross-disciplinary paper considers different senses of responsibility (role, moral, legal and causal), and how they apply in the context of AI-SCS safety. We use a core concept (Actor(A) is responsible for Occurrence(O)) to create role responsibility models, producing a practical method to capture responsibility relationships and provide clarity on the previously identified responsibility issues. Our paper demonstrates the approach with two examples: a retrospective analysis of the Tempe Arizona fatal collision involving an autonomous vehicle, and a safety focused predictive role-responsibility analysis for an AI-based diabetes co-morbidity predictor. In both examples our primary focus is on safety, aiming to reduce unfair or disproportionate blame being placed on operators or developers. We present a discussion and avenues for future research.

19.3IVMay 12, 2025Code

Metrics that matter: Evaluating image quality metrics for medical image generation

Yash Deo, Yan Jia, Toni Lassila et al.

Evaluating generative models for synthetic medical imaging is crucial yet challenging, especially given the high standards of fidelity, anatomical accuracy, and safety required for clinical applications. Standard evaluation of generated images often relies on no-reference image quality metrics when ground truth images are unavailable, but their reliability in this complex domain is not well established. This study comprehensively assesses commonly used no-reference image quality metrics using brain MRI data, including tumour and vascular images, providing a representative exemplar for the field. We systematically evaluate metric sensitivity to a range of challenges, including noise, distribution shifts, and, critically, localised morphological alterations designed to mimic clinically relevant inaccuracies. We then compare these metric scores against model performance on a relevant downstream segmentation task, analysing results across both controlled image perturbations and outputs from different generative model architectures. Our findings reveal significant limitations: many widely-used no-reference image quality metrics correlate poorly with downstream task suitability and exhibit a profound insensitivity to localised anatomical details crucial for clinical validity. Furthermore, these metrics can yield misleading scores regarding distribution shifts, e.g. data memorisation. This reveals the risk of misjudging model readiness, potentially leading to the deployment of flawed tools that could compromise patient safety. We conclude that ensuring generative models are truly fit for clinical purpose requires a multifaceted validation framework, integrating performance on relevant downstream tasks with the cautious interpretation of carefully selected no-reference image quality metrics.

1.2CYDec 9, 2024

Upstream and Downstream AI Safety: Both on the Same River?

John McDermid, Yan Jia, Ibrahim Habli

Traditional safety engineering assesses systems in their context of use, e.g. the operational design domain (road layout, speed limits, weather, etc.) for self-driving vehicles (including those using AI). We refer to this as downstream safety. In contrast, work on safety of frontier AI, e.g. large language models which can be further trained for downstream tasks, typically considers factors that are beyond specific application contexts, such as the ability of the model to evade human control, or to produce harmful content, e.g. how to make bombs. We refer to this as upstream safety. We outline the characteristics of both upstream and downstream safety frameworks then explore the extent to which the broad AI safety community can benefit from synergies between these frameworks. For example, can concepts such as common mode failures from downstream safety be used to help assess the strength of AI guardrails? Further, can the understanding of the capabilities and limitations of frontier AI be used to inform downstream safety analysis, e.g. where LLMs are fine-tuned to calculate voyage plans for autonomous vessels? The paper identifies some promising avenues to explore and outlines some challenges in achieving synergy, or a confluence, between upstream and downstream safety frameworks.

3.3AIMar 24, 2025

The case for delegated AI autonomy for Human AI teaming in healthcare

Yan Jia, Harriet Evans, Zoe Porter et al.

In this paper we propose an advanced approach to integrating artificial intelligence (AI) into healthcare: autonomous decision support. This approach allows the AI algorithm to act autonomously for a subset of patient cases whilst serving a supportive role in other subsets of patient cases based on defined delegation criteria. By leveraging the complementary strengths of both humans and AI, it aims to deliver greater overall performance than existing human-AI teaming models. It ensures safe handling of patient cases and potentially reduces clinician review time, whilst being mindful of AI tool limitations. After setting the approach within the context of current human-AI teaming models, we outline the delegation criteria and apply them to a specific AI-based tool used in histopathology. The potential impact of the approach and the regulatory requirements for its successful implementation are then discussed.

2.6LGJun 23, 2024

Learning Run-time Safety Monitors for Machine Learning Components

Ozan Vardal, Richard Hawkins, Colin Paterson et al.

For machine learning components used as part of autonomous systems (AS) in carrying out critical tasks it is crucial that assurance of the models can be maintained in the face of post-deployment changes (such as changes in the operating environment of the system). A critical part of this is to be able to monitor when the performance of the model at runtime (as a result of changes) poses a safety risk to the system. This is a particularly difficult challenge when ground truth is unavailable at runtime. In this paper we introduce a process for creating safety monitors for ML components through the use of degraded datasets and machine learning. The safety monitor that is created is deployed to the AS in parallel to the ML component to provide a prediction of the safety risk associated with the model output. We demonstrate the viability of our approach through some initial experiments using publicly available speed sign datasets.

16.4LGSep 1, 2021

The Role of Explainability in Assuring Safety of Machine Learning in Healthcare

Yan Jia, John McDermid, Tom Lawton et al.

Established approaches to assuring safety-critical systems and software are difficult to apply to systems employing ML where there is no clear, pre-defined specification against which to assess validity. This problem is exacerbated by the "opaque" nature of ML where the learnt model is not amenable to human scrutiny. Explainable AI (XAI) methods have been proposed to tackle this issue by producing human-interpretable representations of ML models which can help users to gain confidence and build trust in the ML system. However, little work explicitly investigates the role of explainability for safety assurance in the context of ML development. This paper identifies ways in which XAI methods can contribute to safety assurance of ML-based systems. It then uses a concrete ML-based clinical decision support system, concerning weaning of patients from mechanical ventilation, to demonstrate how XAI methods can be employed to produce evidence to support safety assurance. The results are also represented in a safety argument to show where, and in what way, XAI methods can contribute to a safety case. Overall, we conclude that XAI methods have a valuable role in safety assurance of ML-based systems in healthcare but that they are not sufficient in themselves to assure safety.

19.2LGFeb 2, 2021

Guidance on the Assurance of Machine Learning in Autonomous Systems (AMLAS)

Richard Hawkins, Colin Paterson, Chiara Picardi et al.

Machine Learning (ML) is now used in a range of systems with results that are reported to exceed, under certain conditions, human performance. Many of these systems, in domains such as healthcare , automotive and manufacturing, exhibit high degrees of autonomy and are safety critical. Establishing justified confidence in ML forms a core part of the safety case for these systems. In this document we introduce a methodology for the Assurance of Machine Learning for use in Autonomous Systems (AMLAS). AMLAS comprises a set of safety case patterns and a process for (1) systematically integrating safety assurance into the development of ML components and (2) for generating the evidence base for explicitly justifying the acceptable safety of these components when integrated into autonomous system applications.

4.4LGJan 11, 2021

A Framework for Assurance of Medication Safety using Machine Learning

Yan Jia, Tom Lawton, John McDermid et al.

Medication errors continue to be the leading cause of avoidable patient harm in hospitals. This paper sets out a framework to assure medication safety that combines machine learning and safety engineering methods. It uses safety analysis to proactively identify potential causes of medication error, based on expert opinion. As healthcare is now data rich, it is possible to augment safety analysis with machine learning to discover actual causes of medication error from the data, and to identify where they deviate from what was predicted in the safety analysis. Combining these two views has the potential to enable the risk of medication errors to be managed proactively and dynamically. We apply the framework to a case study involving thoracic surgery, e.g. oesophagectomy, where errors in giving beta-blockers can be critical to control atrial fibrillation. This case study combines a HAZOP-based safety analysis method known as SHARD with Bayesian network structure learning and process mining to produce the analysis results, showing the potential of the framework for ensuring patient safety, and for transforming the way that safety is managed in complex healthcare environments.

27.6SEMar 18, 2017

Engineering Trustworthy Self-Adaptive Software with Dynamic Assurance Cases

Radu Calinescu, Danny Weyns, Simos Gerasimou et al.

Building on concepts drawn from control theory, self-adaptive software handles environmental and internal uncertainties by dynamically adjusting its architecture and parameters in response to events such as workload changes and component failures. Self-adaptive software is increasingly expected to meet strict functional and non-functional requirements in applications from areas as diverse as manufacturing, healthcare and finance. To address this need, we introduce a methodology for the systematic ENgineering of TRUstworthy Self-adaptive sofTware (ENTRUST). ENTRUST uses a combination of (1) design-time and runtime modelling and verification, and (2) industry-adopted assurance processes to develop trustworthy self-adaptive software and assurance cases arguing the suitability of the software for its intended application. To evaluate the effectiveness of our methodology, we present a tool-supported instance of ENTRUST and its use to develop proof-of-concept self-adaptive software for embedded and service-based systems from the oceanic monitoring and e-finance domains, respectively. The experimental results show that ENTRUST can be used to engineer self-adaptive software systems in different application domains and to generate dynamic assurance cases for these systems.

4.0SEApr 27, 2014

Formalism of Requirements for Safety-Critical Software: Where Does the Benefit Come From?

Ibrahim Habli, Andrew Rae

Safety and assurance standards often rely on the principle that requirements errors can be minimised by expressing the requirements more formally. Although numerous case studies have shown that the act of formalising previously informal requirements finds requirements errors, this principle is really just a hypothesis. An industrially persuasive causal relationship between formalisation and better requirements has yet to be established. We describe multiple competing explanations for this hypothesis, in terms of the levels of precision, re-formulation, expertise, effort and automation that are typically associated with formalising requirements. We then propose an experiment to distinguish between these explanations, without necessarily excluding the possibility that none of them are correct.