MLApr 4, 2023
Learning from data with structured missingnessRobin Mitra, Sarah F. McGough, Tapabrata Chakraborti et al. · oxford
Missing data are an unavoidable complication in many machine learning tasks. When data are `missing at random' there exist a range of tools and techniques to deal with the issue. However, as machine learning studies become more ambitious, and seek to learn from ever-larger volumes of heterogeneous data, an increasingly encountered problem arises in which missing values exhibit an association or structure, either explicitly or implicitly. Such `structured missingness' raises a range of challenges that have not yet been systematically addressed, and presents a fundamental hindrance to machine learning at scale. Here, we outline the current literature and propose a set of grand challenges in learning from data with structured missingness.
CYApr 6, 2022
Advancing Data Justice Research and Practice: An Integrated Literature ReviewDavid Leslie, Michael Katell, Mhairi Aitken et al.
The Advancing Data Justice Research and Practice (ADJRP) project aims to widen the lens of current thinking around data justice and to provide actionable resources that will help policymakers, practitioners, and impacted communities gain a broader understanding of what equitable, freedom-promoting, and rights-sustaining data collection, governance, and use should look like in increasingly dynamic and global data innovation ecosystems. In this integrated literature review we hope to lay the conceptual groundwork needed to support this aspiration. The introduction motivates the broadening of data justice that is undertaken by the literature review which follows. First, we address how certain limitations of the current study of data justice drive the need for a re-location of data justice research and practice. We map out the strengths and shortcomings of the contemporary state of the art and then elaborate on the challenges faced by our own effort to broaden the data justice perspective in the decolonial context. The body of the literature review covers seven thematic areas. For each theme, the ADJRP team has systematically collected and analysed key texts in order to tell the critical empirical story of how existing social structures and power dynamics present challenges to data justice and related justice fields. In each case, this critical empirical story is also supplemented by the transformational story of how activists, policymakers, and academics are challenging longstanding structures of inequity to advance social justice in data innovation ecosystems and adjacent areas of technological practice.
CYApr 6, 2022
Data Justice Stories: A Repository of Case StudiesDavid Leslie, Morgan Briggs, Antonella Perini et al.
The idea of "data justice" is of recent academic vintage. It has arisen over the past decade in Anglo-European research institutions as an attempt to bring together a critique of the power dynamics that underlie accelerating trends of datafication with a normative commitment to the principles of social justice-a commitment to the achievement of a society that is equitable, fair, and capable of confronting the root causes of injustice.However, despite the seeming novelty of such a data justice pedigree, this joining up of the critique of the power imbalances that have shaped the digital and "big data" revolutions with a commitment to social equity and constructive societal transformation has a deeper historical, and more geographically diverse, provenance. As the stories of the data justice initiatives, activism, and advocacy contained in this volume well evidence, practices of data justice across the globe have, in fact, largely preceded the elaboration and crystallisation of the idea of data justice in contemporary academic discourse. In telling these data justice stories, we hope to provide the reader with two interdependent tools of data justice thinking: First, we aim to provide the reader with the critical leverage needed to discern those distortions and malformations of data justice that manifest in subtle and explicit forms of power, domination, and coercion. Second, we aim to provide the reader with access to the historically effective forms of normativity and ethical insight that have been marshalled by data justice activists and advocates as tools of societal transformation-so that these forms of normativity and insight can be drawn on, in turn, as constructive resources to spur future transformative data justice practices.
CYApr 12, 2022
Data Justice in Practice: A Guide for DevelopersDavid Leslie, Michael Katell, Mhairi Aitken et al.
The Advancing Data Justice Research and Practice project aims to broaden understanding of the social, historical, cultural, political, and economic forces that contribute to discrimination and inequity in contemporary ecologies of data collection, governance, and use. This is the consultation draft of a guide for developers and organisations, which are producing, procuring, or using data-intensive technologies.In the first section, we introduce the field of data justice, from its early discussions to more recent proposals to relocate understandings of what data justice means. This section includes a description of the six pillars of data justice around which this guidance revolves. Next, to support developers in designing, developing, and deploying responsible and equitable data-intensive and AI/ML systems, we outline the AI/ML project lifecycle through a sociotechnical lens. To support the operationalisation data justice throughout the entirety of the AI/ML lifecycle and within data innovation ecosystems, we then present five overarching principles of responsible, equitable, and trustworthy data research and innovation practices, the SAFE-D principles-Safety, Accountability, Fairness, Explainability, and Data Quality, Integrity, Protection, and Privacy. The final section presents guiding questions that will help developers both address data justice issues throughout the AI/ML lifecycle and engage in reflective innovation practices that ensure the design, development, and deployment of responsible and equitable data-intensive and AI/ML systems.
CYJun 12, 2022
Don't "research fast and break things": On the ethics of Computational Social ScienceDavid Leslie
This article is concerned with setting up practical guardrails within the research activities and environments of CSS. It aims to provide CSS scholars, as well as policymakers and other stakeholders who apply CSS methods, with the critical and constructive means needed to ensure that their practices are ethical, trustworthy, and responsible. It begins by providing a taxonomy of the ethical challenges faced by researchers in the field of CSS. These are challenges related to (1) the treatment of research subjects, (2) the impacts of CSS research on affected individuals and communities, (3) the quality of CSS research and to its epistemological status, (4) research integrity, and (5) research equity. Taking these challenges as a motivation for cultural transformation, it then argues for the end-to-end incorporation of habits of responsible research and innovation (RRI) into CSS practices, focusing on the role that contextual considerations, anticipatory reflection, impact assessment, public engagement, and justifiable and well-documented action should play across the research lifecycle. In proposing the inclusion of habits of RRI in CSS practices, the chapter lays out several practical steps needed for ethical, trustworthy, and responsible CSS research activities. These include stakeholder engagement processes, research impact assessments, data lifecycle documentation, bias self-assessments, and transparent research reporting protocols.
CYFeb 19, 2024
AI Fairness in PracticeDavid Leslie, Cami Rincon, Morgan Briggs et al.
Reaching consensus on a commonly accepted definition of AI Fairness has long been a central challenge in AI ethics and governance. There is a broad spectrum of views across society on what the concept of fairness means and how it should best be put to practice. In this workbook, we tackle this challenge by exploring how a context-based and society-centred approach to understanding AI Fairness can help project teams better identify, mitigate, and manage the many ways that unfair bias and discrimination can crop up across the AI project workflow. We begin by exploring how, despite the plurality of understandings about the meaning of fairness, priorities of equality and non-discrimination have come to constitute the broadly accepted core of its application as a practical principle. We focus on how these priorities manifest in the form of equal protection from direct and indirect discrimination and from discriminatory harassment. These elements form ethical and legal criteria based upon which instances of unfair bias and discrimination can be identified and mitigated across the AI project workflow. We then take a deeper dive into how the different contexts of the AI project lifecycle give rise to different fairness concerns. This allows us to identify several types of AI Fairness (Data Fairness, Application Fairness, Model Design and Development Fairness, Metric-Based Fairness, System Implementation Fairness, and Ecosystem Fairness) that form the basis of a multi-lens approach to bias identification, mitigation, and management. Building on this, we discuss how to put the principle of AI Fairness into practice across the AI project workflow through Bias Self-Assessment and Bias Risk Management as well as through the documentation of metric-based fairness criteria in a Fairness Position Statement.
CYFeb 19, 2024
AI Ethics and Governance in Practice: An IntroductionDavid Leslie, Cami Rincon, Morgan Briggs et al.
AI systems may have transformative and long-term effects on individuals and society. To manage these impacts responsibly and direct the development of AI systems toward optimal public benefit, considerations of AI ethics and governance must be a first priority. In this workbook, we introduce and describe our PBG Framework, a multi-tiered governance model that enables project teams to integrate ethical values and practical principles into their innovation practices and to have clear mechanisms for demonstrating and documenting this.
CYFeb 19, 2024
AI Sustainability in Practice Part Two: Sustainability Throughout the AI WorkflowDavid Leslie, Cami Rincon, Morgan Briggs et al.
The sustainability of AI systems depends on the capacity of project teams to proceed with a continuous sensitivity to their potential real-world impacts and transformative effects. Stakeholder Impact Assessments (SIAs) are governance mechanisms that enable this kind of responsiveness. They are tools that create a procedure for, and a means of documenting, the collaborative evaluation and reflective anticipation of the possible harms and benefits of AI innovation projects. SIAs are not one-off governance actions. They require project teams to pay continuous attention to the dynamic and changing character of AI production and use and to the shifting conditions of the real-world environments in which AI technologies are embedded. This workbook is part two of two workbooks on AI Sustainability. It provides a template of the SIA and activities that allow a deeper dive into crucial parts of it. It discusses methods for weighing values and considering trade-offs during the SIA. And, it highlights the need to treat the SIA as an end-to-end process of responsive evaluation and re-assessment.
CYFeb 19, 2024
AI Sustainability in Practice Part One: Foundations for Sustainable AI ProjectsDavid Leslie, Cami Rincon, Morgan Briggs et al.
Sustainable AI projects are continuously responsive to the transformative effects as well as short-, medium-, and long-term impacts on individuals and society that the design, development, and deployment of AI technologies may have. Projects, which centre AI Sustainability, ensure that values-led, collaborative, and anticipatory reflection both guides the assessment of potential social and ethical impacts and steers responsible innovation practices. This workbook is the first part of a pair that provides the concepts and tools needed to put AI Sustainability into practice. It introduces the SUM Values, which help AI project teams to assess the potential societal impacts and ethical permissibility of their projects. It then presents a Stakeholder Engagement Process (SEP), which provides tools to facilitate proportionate engagement of and input from stakeholders with an emphasis on equitable and meaningful participation and positionality awareness.
AIFeb 6, 2022
Human rights, democracy, and the rule of law assurance framework for AI systems: A proposalDavid Leslie, Christopher Burr, Mhairi Aitken et al.
Following on from the publication of its Feasibility Study in December 2020, the Council of Europe's Ad Hoc Committee on Artificial Intelligence (CAHAI) and its subgroups initiated efforts to formulate and draft its Possible Elements of a Legal Framework on Artificial Intelligence, based on the Council of Europe's standards on human rights, democracy, and the rule of law. This document was ultimately adopted by the CAHAI plenary in December 2021. To support this effort, The Alan Turing Institute undertook a programme of research that explored the governance processes and practical tools needed to operationalise the integration of human right due diligence with the assurance of trustworthy AI innovation practices. The resulting framework was completed and submitted to the Council of Europe in September 2021. It presents an end-to-end approach to the assurance of AI project lifecycles that integrates context-based risk analysis and appropriate stakeholder engagement with comprehensive impact assessment, and transparent risk management, impact mitigation, and innovation assurance practices. Taken together, these interlocking processes constitute a Human Rights, Democracy and the Rule of Law Assurance Framework (HUDERAF). The HUDERAF combines the procedural requirements for principles-based human rights due diligence with the governance mechanisms needed to set up technical and socio-technical guardrails for responsible and trustworthy AI innovation practices. Its purpose is to provide an accessible and user-friendly set of mechanisms for facilitating compliance with a binding legal framework on artificial intelligence, based on the Council of Europe's standards on human rights, democracy, and the rule of law, and to ensure that AI innovation projects are carried out with appropriate levels of public accountability, transparency, and democratic governance.
MLJun 3, 2021
Gaussian Processes on HypergraphsThomas Pinder, Kathryn Turnbull, Christopher Nemeth et al.
We derive a Matern Gaussian process (GP) on the vertices of a hypergraph. This enables estimation of regression models of observed or latent values associated with the vertices, in which the correlation and uncertainty estimates are informed by the hypergraph structure. We further present a framework for embedding the vertices of a hypergraph into a latent space using the hypergraph GP. Finally, we provide a scheme for identifying a small number of representative inducing vertices that enables scalable inference through sparse GPs. We demonstrate the utility of our framework on three challenging real-world problems that concern multi-class classification for the political party affiliation of legislators on the basis of voting behaviour, probabilistic matrix factorisation of movie reviews, and embedding a hypergraph of animals into a low-dimensional latent space.
CYApr 30, 2021
Does "AI" stand for augmenting inequality in the era of covid-19 healthcare?David Leslie, Anjali Mazumder, Aidan Peppin et al.
Among the most damaging characteristics of the covid-19 pandemic has been its disproportionate effect on disadvantaged communities. As the outbreak has spread globally, factors such as systemic racism, marginalisation, and structural inequality have created path dependencies that have led to poor health outcomes. These social determinants of infectious disease and vulnerability to disaster have converged to affect already disadvantaged communities with higher levels of economic instability, disease exposure, infection severity, and death. Artificial intelligence (AI) technologies are an important part of the health informatics toolkit used to fight contagious disease. AI is well known, however, to be susceptible to algorithmic biases that can entrench and augment existing inequality. Uncritically deploying AI in the fight against covid-19 thus risks amplifying the pandemic's adverse effects on vulnerable groups, exacerbating health inequity. In this paper, we claim that AI systems can introduce or reflect bias and discrimination in three ways: in patterns of health discrimination that become entrenched in datasets, in data representativeness, and in human choices made during the design, development, and deployment of these systems. We highlight how the use of AI technologies threaten to exacerbate the disparate effect of covid-19 on marginalised, under-represented, and vulnerable groups, particularly black, Asian, and other minoritised ethnic people, older populations, and those of lower socioeconomic status. We conclude that, to mitigate the compounding effects of AI on inequalities associated with covid-19, decision makers, technology developers, and health officials must account for the potential biases and inequities at all stages of the AI process.
CYApr 2, 2021
Artificial intelligence, human rights, democracy, and the rule of law: a primerDavid Leslie, Christopher Burr, Mhairi Aitken et al.
In September 2019, the Council of Europe's Committee of Ministers adopted the terms of reference for the Ad Hoc Committee on Artificial Intelligence (CAHAI). The CAHAI is charged with examining the feasibility and potential elements of a legal framework for the design, development, and deployment of AI systems that accord with Council of Europe standards across the interrelated areas of human rights, democracy, and the rule of law. As a first and necessary step in carrying out this responsibility, the CAHAI's Feasibility Study, adopted by its plenary in December 2020, has explored options for an international legal response that fills existing gaps in legislation and tailors the use of binding and non-binding legal instruments to the specific risks and opportunities presented by AI systems. The Study examines how the fundamental rights and freedoms that are already codified in international human rights law can be used as the basis for such a legal framework. The purpose of this primer is to introduce the main concepts and principles presented in the CAHAI's Feasibility Study for a general, non-technical audience. It also aims to provide some background information on the areas of AI innovation, human rights law, technology policy, and compliance mechanisms covered therein. In keeping with the Council of Europe's commitment to broad multi-stakeholder consultations, outreach, and engagement, this primer has been designed to help facilitate the meaningful and informed participation of an inclusive group of stakeholders as the CAHAI seeks feedback and guidance regarding the essential issues raised by the Feasibility Study.
CYMar 20, 2021
Explaining decisions made with AI: A workbook (Use case 1: AI-assisted recruitment tool)David Leslie, Morgan Briggs
Over the last two years, The Alan Turing Institute and the Information Commissioner's Office (ICO) have been working together to discover ways to tackle the difficult issues surrounding explainable AI. The ultimate product of this joint endeavour, Explaining decisions made with AI, published in May 2020, is the most comprehensive practical guidance on AI explanation produced anywhere to date. We have put together this workbook to help support the uptake of that guidance. The goal of the workbook is to summarise some of main themes from Explaining decisions made with AI and then to provide the materials for a workshop exercise that has been built around a use case created to help you gain a flavour of how to put the guidance into practice. In the first three sections, we run through the basics of Explaining decisions made with AI. We provide a precis of the four principles of AI explainability, the typology of AI explanations, and the tasks involved in the explanation-aware design, development, and use of AI/ML systems. We then provide some reflection questions, which are intended to be a launching pad for group discussion, and a starting point for the case-study-based exercise that we have included as Appendix B. In Appendix A, we go into more detailed suggestions about how to organise the workshop. These recommendations are based on two workshops we had the privilege of co-hosting with our colleagues from the ICO and Manchester Metropolitan University in January 2021. The participants of these workshops came from both the private and public sectors, and we are extremely grateful to them for their energy, enthusiasm, and tremendous insight. This workbook would simply not exist without the commitment and keenness of all our collaborators and workshop participants.
HIST-PHFeb 6, 2021
The Arc of the Data Scientific UniverseDavid Leslie
In this paper I explore the scaffolding of normative assumptions that supports Sabina Leonelli's implicit appeal to the values of epistemic integrity and the global public good that conjointly animate the ethos of responsible and sustainable data work in the context of COVID-19. Drawing primarily on the writings of sociologist Robert K. Merton, the thinkers of the Vienna Circle, and Charles Sanders Peirce, I make some of these assumptions explicit by telling a longer story about the evolution of social thinking about the normative structure of science from Merton's articulation of his well-known norms (those of universalism, communism, organized skepticism, and disinterestedness) to the present. I show that while Merton's norms and his intertwinement of these with the underlying mechanisms of democratic order provide us with an especially good starting point to explore and clarify the commitments and values of science, Leonelli's broader, more context-responsive, and more holistic vision of the epistemic integrity of data scientific understanding, and her discernment of the global and biospheric scope of its moral-practical reach, move beyond Merton's schema in ways that effectively draw upon important critiques. Stepping past Merton, I argue that a combination of situated universalism, methodological pluralism, strong objectivity, and unbounded communalism must guide the responsible and sustainable data work of the future.
CYOct 5, 2020
Understanding bias in facial recognition technologiesDavid Leslie
Over the past couple of years, the growing debate around automated facial recognition has reached a boiling point. As developers have continued to swiftly expand the scope of these kinds of technologies into an almost unbounded range of applications, an increasingly strident chorus of critical voices has sounded concerns about the injurious effects of the proliferation of such systems. Opponents argue that the irresponsible design and use of facial detection and recognition technologies (FDRTs) threatens to violate civil liberties, infringe on basic human rights and further entrench structural racism and systemic marginalisation. They also caution that the gradual creep of face surveillance infrastructures into every domain of lived experience may eventually eradicate the modern democratic forms of life that have long provided cherished means to individual flourishing, social solidarity and human self-creation. Defenders, by contrast, emphasise the gains in public safety, security and efficiency that digitally streamlined capacities for facial identification, identity verification and trait characterisation may bring. In this explainer, I focus on one central aspect of this debate: the role that dynamics of bias and discrimination play in the development and deployment of FDRTs. I examine how historical patterns of discrimination have made inroads into the design and implementation of FDRTs from their very earliest moments. And, I explain the ways in which the use of biased FDRTs can lead distributional and recognitional injustices. The explainer concludes with an exploration of broader ethical questions around the potential proliferation of pervasive face-based surveillance infrastructures and makes some recommendations for cultivating more responsible approaches to the development and governance of these technologies.
MLSep 25, 2020
Stein Variational Gaussian ProcessesThomas Pinder, Christopher Nemeth, David Leslie
We show how to use Stein variational gradient descent (SVGD) to carry out inference in Gaussian process (GP) models with non-Gaussian likelihoods and large data volumes. Markov chain Monte Carlo (MCMC) is extremely computationally intensive for these situations, but the parametric assumptions required for efficient variational inference (VI) result in incorrect inference when they encounter the multi-modal posterior distributions that are common for such models. SVGD provides a non-parametric alternative to variational inference which is substantially faster than MCMC. We prove that for GP models with Lipschitz gradients the SVGD algorithm monotonically decreases the Kullback-Leibler divergence from the sampling distribution to the true posterior. Our method is demonstrated on benchmark problems in both regression and classification, a multimodal posterior, and an air quality example with 550,134 spatiotemporal observations, showing substantial performance improvements over MCMC and VI.
CYAug 15, 2020
Tackling COVID-19 through Responsible AI Innovation: Five Steps in the Right DirectionDavid Leslie
Innovations in data science and AI/ML have a central role to play in supporting global efforts to combat COVID-19. The versatility of AI/ML technologies enables scientists and technologists to address an impressively broad range of biomedical, epidemiological, and socioeconomic challenges. This wide-reaching scientific capacity, however, also raises a diverse array of ethical challenges. The need for researchers to act quickly and globally in tackling SARS-CoV-2 demands unprecedented practices of open research and responsible data sharing at a time when innovation ecosystems are hobbled by proprietary protectionism, inequality, and a lack of public trust. Moreover, societally impactful interventions like digital contact tracing are raising fears of surveillance creep and are challenging widely held commitments to privacy, autonomy, and civil liberties. Prepandemic concerns that data-driven innovations may function to reinforce entrenched dynamics of societal inequity have likewise intensified given the disparate impact of the virus on vulnerable social groups and the life-and-death consequences of biased and discriminatory public health outcomes. To address these concerns, I offer five steps that need to be taken to encourage responsible research and innovation. These provide a practice-based path to responsible AI/ML design and discovery centered on open, accountable, equitable, and democratically governed processes and products. When taken from the start, these steps will not only enhance the capacity of innovators to tackle COVID-19 responsibly, they will, more broadly, help to better equip the data science and AI/ML community to cope with future pandemics and to support a more humane, rational, and just society.
CYJun 11, 2019
Understanding artificial intelligence ethics and safetyDavid Leslie
A remarkable time of human promise has been ushered in by the convergence of the ever-expanding availability of big data, the soaring speed and stretch of cloud computing platforms, and the advancement of increasingly sophisticated machine learning algorithms. Innovations in AI are already leaving a mark on government by improving the provision of essential social goods and services from healthcare, education, and transportation to food supply, energy, and environmental management. These bounties are likely just the start. The prospect that progress in AI will help government to confront some of its most urgent challenges is exciting, but legitimate worries abound. As with any new and rapidly evolving technology, a steep learning curve means that mistakes and miscalculations will be made and that both unanticipated and harmful impacts will occur. This guide, written for department and delivery leads in the UK public sector and adopted by the British Government in its publication, 'Using AI in the Public Sector,' identifies the potential harms caused by AI systems and proposes concrete, operationalisable measures to counteract them. It stresses that public sector organisations can anticipate and prevent these potential harms by stewarding a culture of responsible innovation and by putting in place governance processes that support the design and implementation of ethical, fair, and safe AI systems. It also highlights the need for algorithmically supported outcomes to be interpretable by their users and made understandable to decision subjects in clear, non-technical, and accessible ways. Finally, it builds out a vision of human-centred and context-sensitive implementation that gives a central role to communication, evidence-based reasoning, situational awareness, and moral justifiability.