Saba Esnaashari

CY
h-index5
4papers
19citations
Novelty26%
AI Score21

4 Papers

CYMar 17, 2023
A multidomain relational framework to guide institutional AI research and adoption

Vincent J. Straub, Deborah Morgan, Youmna Hashem et al.

Calls for new metrics, technical standards and governance mechanisms to guide the adoption of Artificial Intelligence (AI) in institutions and public administration are now commonplace. Yet, most research and policy efforts aimed at understanding the implications of adopting AI tend to prioritize only a handful of ideas; they do not fully connect all the different perspectives and topics that are potentially relevant. In this position paper, we contend that this omission stems, in part, from what we call the relational problem in socio-technical discourse: fundamental ontological issues have not yet been settled--including semantic ambiguity, a lack of clear relations between concepts and differing standard terminologies. This contributes to the persistence of disparate modes of reasoning to assess institutional AI systems, and the prevalence of conceptual isolation in the fields that study them including ML, human factors, social science and policy. After developing this critique, we offer a way forward by proposing a simple policy and research design tool in the form of a conceptual framework to organize terms across fields--consisting of three horizontal domains for grouping relevant concepts and related methods: Operational, Epistemic, and Normative. We first situate this framework against the backdrop of recent socio-technical discourse at two premier academic venues, AIES and FAccT, before illustrating how developing suitable metrics, standards, and mechanisms can be aided by operationalizing relevant concepts in each of these domains. Finally, we outline outstanding questions for developing this relational approach to institutional AI research and adoption.

CYMar 24, 2023
'Team-in-the-loop': Ostrom's IAD framework 'rules in use' to map and measure contextual impacts of AI

Deborah Morgan, Youmna Hashem, John Francis et al.

This article explores how the 'rules in use' from Ostrom's Institutional Analysis and Development Framework (IAD) can be developed as a context analysis approach for AI. AI risk assessment frameworks increasingly highlight the need to understand existing contexts. However, these approaches do not frequently connect with established institutional analysis scholarship. We outline a novel direction illustrated through a high-level example to understand how clinical oversight is potentially impacted by AI. Much current thinking regarding oversight for AI revolves around the idea of decision makers being in-the-loop and, thus, having capacity to intervene to prevent harm. However, our analysis finds that oversight is complex, frequently made by teams of professionals and relies upon explanation to elicit information. Professional bodies and liability also function as institutions of polycentric oversight. These are all impacted by the challenge of oversight of AI systems. The approach outlined has potential utility as a policy tool of context analysis aligned with the 'Govern and Map' functions of the National Institute of Standards and Technology (NIST) AI Risk Management Framework; however, further empirical research is needed. Our analysis illustrates the benefit of existing institutional analysis approaches in foregrounding team structures within oversight and, thus, in conceptions of 'human in the loop'.

CYMar 18, 2024
AI for bureaucratic productivity: Measuring the potential of AI to help automate 143 million UK government transactions

Vincent J. Straub, Youmna Hashem, Jonathan Bright et al.

There is currently considerable excitement within government about the potential of artificial intelligence to improve public service productivity through the automation of complex but repetitive bureaucratic tasks, freeing up the time of skilled staff. Here, we explore the size of this opportunity, by mapping out the scale of citizen-facing bureaucratic decision-making procedures within UK central government, and measuring their potential for AI-driven automation. We estimate that UK central government conducts approximately one billion citizen-facing transactions per year in the provision of around 400 services, of which approximately 143 million are complex repetitive transactions. We estimate that 84% of these complex transactions are highly automatable, representing a huge potential opportunity: saving even an average of just one minute per complex transaction would save the equivalent of approximately 1,200 person-years of work every year. We also develop a model to estimate the volume of transactions a government service undertakes, providing a way for government to avoid conducting time consuming transaction volume measurements. Finally, we find that there is high turnover in the types of services government provide, meaning that automation efforts should focus on general procedures rather than services themselves which are likely to evolve over time. Overall, our work presents a novel perspective on the structure and functioning of modern government, and how it might evolve in the age of artificial intelligence.

CLNov 29, 2024
MIMDE: Exploring the Use of Synthetic vs Human Data for Evaluating Multi-Insight Multi-Document Extraction Tasks

John Francis, Saba Esnaashari, Anton Poletaev et al.

Large language models (LLMs) have demonstrated remarkable capabilities in text analysis tasks, yet their evaluation on complex, real-world applications remains challenging. We define a set of tasks, Multi-Insight Multi-Document Extraction (MIMDE) tasks, which involves extracting an optimal set of insights from a document corpus and mapping these insights back to their source documents. This task is fundamental to many practical applications, from analyzing survey responses to processing medical records, where identifying and tracing key insights across documents is crucial. We develop an evaluation framework for MIMDE and introduce a novel set of complementary human and synthetic datasets to examine the potential of synthetic data for LLM evaluation. After establishing optimal metrics for comparing extracted insights, we benchmark 20 state-of-the-art LLMs on both datasets. Our analysis reveals a strong correlation (0.71) between the ability of LLMs to extracts insights on our two datasets but synthetic data fails to capture the complexity of document-level analysis. These findings offer crucial guidance for the use of synthetic data in evaluating text analysis systems, highlighting both its potential and limitations.