Yulu Pi

AI
h-index6
9papers
21citations
Novelty28%
AI Score33

9 Papers

AIAug 2, 2024
From Stem to Stern: Contestability Along AI Value Chains

Agathe Balayn, Yulu Pi, David Gray Widder et al.

This workshop will grow and consolidate a community of interdisciplinary CSCW researchers focusing on the topic of contestable AI. As an outcome of the workshop, we will synthesize the most pressing opportunities and challenges for contestability along AI value chains in the form of a research roadmap. This roadmap will help shape and inspire imminent work in this field. Considering the length and depth of AI value chains, it will especially spur discussions around the contestability of AI systems along various sites of such chains. The workshop will serve as a platform for dialogue and demonstrations of concrete, successful, and unsuccessful examples of AI systems that (could or should) have been contested, to identify requirements, obstacles, and opportunities for designing and deploying contestable AI in various contexts. This will be held primarily as an in-person workshop, with some hybrid accommodation. The day will consist of individual presentations and group activities to stimulate ideation and inspire broad reflections on the field of contestable AI. Our aim is to facilitate interdisciplinary dialogue by bringing together researchers, practitioners, and stakeholders to foster the design and deployment of contestable AI.

LGSep 6, 2024
Evaluating Fairness in Transaction Fraud Models: Fairness Metrics, Bias Audits, and Challenges

Parameswaran Kamalaruban, Yulu Pi, Stuart Burrell et al.

Ensuring fairness in transaction fraud detection models is vital due to the potential harms and legal implications of biased decision-making. Despite extensive research on algorithmic fairness, there is a notable gap in the study of bias in fraud detection models, mainly due to the field's unique challenges. These challenges include the need for fairness metrics that account for fraud data's imbalanced nature and the tradeoff between fraud protection and service quality. To address this gap, we present a comprehensive fairness evaluation of transaction fraud models using public synthetic datasets, marking the first algorithmic bias audit in this domain. Our findings reveal three critical insights: (1) Certain fairness metrics expose significant bias only after normalization, highlighting the impact of class imbalance. (2) Bias is significant in both service quality-related parity metrics and fraud protection-related parity metrics. (3) The fairness through unawareness approach, which involved removing sensitive attributes such as gender, does not improve bias mitigation within these datasets, likely due to the presence of correlated proxies. We also discuss socio-technical fairness-related challenges in transaction fraud models. These insights underscore the need for a nuanced approach to fairness in fraud detection, balancing protection and service quality, and moving beyond simple bias mitigation strategies. Future work must focus on refining fairness metrics and developing methods tailored to the unique complexities of the transaction fraud domain.

CYApr 25
Understanding the Role of Algorithm Registers in AI Governance Through Comparative Analysis of China and the UK

Yulu Pi, Wenlong Li, Jatinder Singh

Algorithm registers are increasingly being both considered and deployed as instruments in AI governance. They are often expected to deliver transparency; however, in practice their design, scope, and implementation vary substantially. Currently, we lack a holistic understanding of the potential roles that registers might play in AI governance, and how different design choices both shape and reflect those roles. This paper therefore asks how do algorithm registers differ across jurisdictions, and what do these differences reveal about their roles in AI governance? Towards this, we conduct a comparative analysis of two influential but contrasting algorithm registration mechanisms, China's Beian system and the UK's Algorithmic Transparency Recording Standard (ATRS), drawing on publicly available regulatory documents, registration guidelines, and registry data. Crucially, our analysis shows that an algorithm register, depending on its design and implementation, can serve functions beyond transparency, including pre-market approval, enabling ecosystem-level understanding, and acting as a broader regulatory infrastructure. As algorithm registries proliferate globally, we stress the importance of researchers and policymakers considering and examining the concrete governance functions that algorithm registries can perform as a result of their design and institutional context, rather than approaching them primarily through a transparency lens.

HCMay 10
Push and Pushback in Contesting AI: Demands for and Resistance to Accountability

Yulu Pi, Lucas Lichner, Jae Woo Lee et al.

As AI becomes increasingly embedded in daily life, it has been shown to fail critically, cause harm, and spark public controversy, prompting affected communities, workers, and public-interest groups to contest it. Yet how these contestations unfold in practice remains underexplored. We address this gap by developing an empirically grounded account of AI contestation dynamics. We do so through a thematic analysis of 43 real-world cases in which affected actors direct demands toward those responsible for AI development and deployment, seeking redress, influence, or changes to AI practices. Situating our work within Bovens's relational model of accountability, we conceptualize contestation as accountability-seeking: a dynamic, iterative process in which actors "from below" direct explicit demands at actors "from above," who respond by accepting, resisting, or circumventing accountability. Our analysis produces empirically grounded categories of contestation strategies, institutional response tactics, outcome types, and the contextual factors that shape them, illuminating how accountability is pursued and evaded in practice. We show that those being contested often deploy a range of strategies to limit their accountability. Based on these insights, we offer guidance for researchers, policymakers, advocates, and other stakeholders seeking to support effective AI contestation, with particular attention to anticipating and countering institutional strategies used to evade accountability.

AISep 7, 2023
Beyond XAI:Obstacles Towards Responsible AI

Yulu Pi

The rapidly advancing domain of Explainable Artificial Intelligence (XAI) has sparked significant interests in developing techniques to make AI systems more transparent and understandable. Nevertheless, in real-world contexts, the methods of explainability and their evaluation strategies present numerous limitations.Moreover, the scope of responsible AI extends beyond just explainability. In this paper, we explore these limitations and discuss their implications in a boarder context of responsible AI when considering other important aspects, including privacy, fairness and contestability.

LGJan 18, 2025
Measuring Fairness in Financial Transaction Machine Learning Models

Deniz Sezin Ayvaz, Lorenzo Belenguer, Hankun He et al.

Mastercard, a global leader in financial services, develops and deploys machine learning models aimed at optimizing card usage and preventing attrition through advanced predictive models. These models use aggregated and anonymized card usage patterns, including cross-border transactions and industry-specific spending, to tailor bank offerings and maximize revenue opportunities. Mastercard has established an AI Governance program, based on its Data and Tech Responsibility Principles, to evaluate any built and bought AI for efficacy, fairness, and transparency. As part of this effort, Mastercard has sought expertise from the Turing Institute through a Data Study Group to better assess fairness in more complex AI/ML models. The Data Study Group challenge lies in defining, measuring, and mitigating fairness in these predictions, which can be complex due to the various interpretations of fairness, gaps in the research literature, and ML-operations challenges.

CYMay 22, 2025
A Toolkit for Compliance, a Toolkit for Justice: Drawing on Cross-sectoral Expertise to Develop a Pro-justice EU AI Act Toolkit

Tomasz Hollanek, Yulu Pi, Cosimo Fiorini et al.

The introduction of the AI Act in the European Union presents the AI research and practice community with a set of new challenges related to compliance. While it is certain that AI practitioners will require additional guidance and tools to meet these requirements, previous research on toolkits that aim to translate the theory of AI ethics into development and deployment practice suggests that such resources suffer from multiple limitations. These limitations stem, in part, from the fact that the toolkits are either produced by industry-based teams or by academics whose work tends to be abstract and divorced from the realities of industry. In this paper, we discuss the challenge of developing an AI ethics toolkit for practitioners that helps them comply with new AI-focused regulation, but that also moves beyond mere compliance to consider broader socio-ethical questions throughout development and deployment. The toolkit was created through a cross-sectoral collaboration between an academic team based in the UK and an industry team in Italy. We outline the background and rationale for creating a pro-justice AI Act compliance toolkit, detail the process undertaken to develop it, and describe the collaboration and negotiation efforts that shaped its creation. We aim for the described process to serve as a blueprint for other teams navigating the challenges of academia-industry partnerships and aspiring to produce usable and meaningful AI ethics resources.

CRMar 31, 2025
Detecting Malicious AI Agents Through Simulated Interactions

Yulu Pi, Ella Bettison, Anna Becker

This study investigates malicious AI Assistants' manipulative traits and whether the behaviours of malicious AI Assistants can be detected when interacting with human-like simulated users in various decision-making contexts. We also examine how interaction depth and ability of planning influence malicious AI Assistants' manipulative strategies and effectiveness. Using a controlled experimental design, we simulate interactions between AI Assistants (both benign and deliberately malicious) and users across eight decision-making scenarios of varying complexity and stakes. Our methodology employs two state-of-the-art language models to generate interaction data and implements Intent-Aware Prompting (IAP) to detect malicious AI Assistants. The findings reveal that malicious AI Assistants employ domain-specific persona-tailored manipulation strategies, exploiting simulated users' vulnerabilities and emotional triggers. In particular, simulated users demonstrate resistance to manipulation initially, but become increasingly vulnerable to malicious AI Assistants as the depth of the interaction increases, highlighting the significant risks associated with extended engagement with potentially manipulative systems. IAP detection methods achieve high precision with zero false positives but struggle to detect many malicious AI Assistants, resulting in high false negative rates. These findings underscore critical risks in human-AI interactions and highlight the need for robust, context-sensitive safeguards against manipulative AI behaviour in increasingly autonomous decision-support systems.

AIJun 20, 2024
Compliance Cards: Automated EU AI Act Compliance Analyses amidst a Complex AI Supply Chain

Bill Marino, Yaqub Chaudhary, Yulu Pi et al.

As the AI supply chain grows more complex, AI systems and models are increasingly likely to incorporate multiple internally- or externally-sourced components such as datasets and (pre-trained) models. In such cases, determining whether or not the aggregate AI system or model complies with the EU AI Act (AIA) requires a multi-step process in which compliance-related information about both the AI system or model and all its component parts is: (1) gathered, potentially from multiple arms-length sources; (2) harmonized, if necessary; (3) inputted into an analysis that looks across all of it to render a compliance prediction. Because this process is so complex and time-consuming, it threatens to overburden the limited compliance resources of the AI providers (i.e., developers) who bear much of the responsibility for complying with the AIA. It also renders rapid or real-time compliance analyses infeasible in many AI development scenarios where they would be beneficial to providers. To address these shortcomings, we introduce a complete system for automating provider-side AIA compliance analyses amidst a complex AI supply chain. This system has two key elements. First is an interlocking set of computational, multi-stakeholder transparency artifacts that capture AIA-specific metadata about both: (1) the provider's overall AI system or model; and (2) the datasets and pre-trained models it incorporates as components. Second is an algorithm that operates across all those artifacts to render a real-time prediction about whether or not the aggregate AI system or model complies with the AIA. All told, this system promises to dramatically facilitate and democratize provider-side AIA compliance analyses (and, perhaps by extension, provider-side AIA compliance).