22.3CYApr 13
Epistemic Trust as a Mechanism for Ethics Integration: Failure Modes and Design Principles from 70 Moral Imagination WorkshopsBenjamin Lange, Geoff Keeling, Kyle Pedersen et al.
Bottom-up responsible innovation initiatives seek to empower technology development teams to engage in ethical reflection, yet such interventions frequently fail to achieve practitioner engagement. Why do some ethics interventions succeed while others are dismissed as irrelevant, adversarial, or disconnected from work? This paper proposes epistemic trust -- the degree to which practitioners regard an intervention, its facilitators, and its content as credible, relevant, and actionable -- as a conceptual model linking intervention design to engagement outcomes. Drawing on philosophical work on testimony and on practice-based qualitative analysis of over 70 moral imagination workshops with engineering teams between 2019 and 2025, we identify five dimensions of epistemic trust salient to ethics interventions (Relevance, Inclusivity, Agency, Authority, and Alignment) and present a typology of 23 failure modes that arise when these dimensions are inadequately addressed. We derive nine design principles for cultivating epistemic trust, grounded in our operationalisation of moral imagination through technomoral scenarios and structured deliberation. Our findings contribute to the literature on collaborative socio-technical integration by specifying conditions of uptake that existing frameworks leave undertheorised. We acknowledge limitations including selection effects from voluntary participation and the absence of formal outcome measures, and position our failure mode typology as practitioner hypotheses warranting further empirical validation.
CYApr 23, 2024
A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AISeliem El-Sayed, Canfer Akbulut, Amanda McCroskery et al.
Recent generative AI systems have demonstrated more advanced persuasive capabilities and are increasingly permeating areas of life where they can influence decision-making. Generative AI presents a new risk profile of persuasion due the opportunity for reciprocal exchange and prolonged interactions. This has led to growing concerns about harms from AI persuasion and how they can be mitigated, highlighting the need for a systematic study of AI persuasion. The current definitions of AI persuasion are unclear and related harms are insufficiently studied. Existing harm mitigation approaches prioritise harms from the outcome of persuasion over harms from the process of persuasion. In this paper, we lay the groundwork for the systematic study of AI persuasion. We first put forward definitions of persuasive generative AI. We distinguish between rationally persuasive generative AI, which relies on providing relevant facts, sound reasoning, or other forms of trustworthy evidence, and manipulative generative AI, which relies on taking advantage of cognitive biases and heuristics or misrepresenting information. We also put forward a map of harms from AI persuasion, including definitions and examples of economic, physical, environmental, psychological, sociocultural, political, privacy, and autonomy harm. We then introduce a map of mechanisms that contribute to harmful persuasion. Lastly, we provide an overview of approaches that can be used to mitigate against process harms of persuasion, including prompt engineering for manipulation classification and red teaming. Future work will operationalise these mitigations and study the interaction between different types of mechanisms of persuasion.
CLJan 17, 2024
Should agentic conversational AI change how we think about ethics? Characterising an interactional ethics centred on respectLize Alberts, Geoff Keeling, Amanda McCroskery
With the growing popularity of conversational agents based on large language models (LLMs), we need to ensure their behaviour is ethical and appropriate. Work in this area largely centres around the 'HHH' criteria: making outputs more helpful and honest, and avoiding harmful (biased, toxic, or inaccurate) statements. Whilst this semantic focus is useful when viewing LLM agents as mere mediums or output-generating systems, it fails to account for pragmatic factors that can make the same speech act seem more or less tactless or inconsiderate in different social situations. With the push towards agentic AI, wherein systems become increasingly proactive in chasing goals and performing actions in the world, considering the pragmatics of interaction becomes essential. We propose an interactional approach to ethics that is centred on relational and situational factors. We explore what it means for a system, as a social actor, to treat an individual respectfully in a (series of) interaction(s). Our work anticipates a set of largely unexplored risks at the level of situated social interaction, and offers practical suggestions to help agentic LLM technologies treat people well.