Malak Sadek

HC
5papers
32citations
Novelty24%
AI Score31

5 Papers

CYApr 12, 2023
Positive AI: Key Challenges in Designing Artificial Intelligence for Wellbeing

Willem van der Maden, Derek Lomas, Malak Sadek et al.

Artificial Intelligence (AI) is a double-edged sword: on one hand, AI promises to provide great advances that could benefit humanity, but on the other hand, AI poses substantial (even existential) risks. With advancements happening daily, many people are increasingly worried about AI's impact on their lives. To ensure AI progresses beneficially, some researchers have proposed "wellbeing" as a key objective to govern AI. This article addresses key challenges in designing AI for wellbeing. We group these challenges into issues of modeling wellbeing in context, assessing wellbeing in context, designing interventions to improve wellbeing, and maintaining AI alignment with wellbeing over time. The identification of these challenges provides a scope for efforts to help ensure that AI developments are aligned with human wellbeing.

HCOct 18, 2023
The Value-Sensitive Conversational Agent Co-Design Framework

Malak Sadek, Rafael A. Calvo, Celine Mougenot

Conversational agents (CAs) are gaining traction in both industry and academia, especially with the advent of generative AI and large language models. As these agents are used more broadly by members of the general public and take on a number of critical use cases and social roles, it becomes important to consider the values embedded in these systems. This consideration includes answering questions such as 'whose values get embedded in these agents?' and 'how do those values manifest in the agents being designed?' Accordingly, the aim of this paper is to present the Value-Sensitive Conversational Agent (VSCA) Framework for enabling the collaborative design (co-design) of value-sensitive CAs with relevant stakeholders. Firstly, requirements for co-designing value-sensitive CAs which were identified in previous works are summarised here. Secondly, the practical framework is presented and discussed, including its operationalisation into a design toolkit. The framework facilitates the co-design of three artefacts that elicit stakeholder values and have a technical utility to CA teams to guide CA implementation, enabling the creation of value-embodied CA prototypes. Finally, an evaluation protocol for the framework is proposed where the effects of the framework and toolkit are explored in a design workshop setting to evaluate both the process followed and the outcomes produced.

SEJan 25
Results-Actionability Gap: Understanding How Practitioners Evaluate LLM Products in the Wild

Willem van der Maden, Malak Sadek, Ziang Xiao et al.

How do product teams evaluate LLM-powered products? As organizations integrate large language models (LLMs) into digital products, their unpredictable nature makes traditional evaluation approaches inadequate, yet little is known about how practitioners navigate this challenge. Through interviews with nineteen practitioners across diverse sectors, we identify ten evaluation practices spanning informal 'vibe checks' to organizational meta-work. Beyond confirming four documented challenges, we introduce a novel fifth we call the results-actionability gap, in which practitioners gather evaluation data but cannot translate findings into concrete improvements. Drawing on patterns from successful teams, we contribute strategies to bridge this gap, supporting practitioners' formalization journey from ad-hoc interpretive practices (e.g., vibe checks) toward systematic evaluation. Our analysis suggests these interpretive practices are necessary adaptations to LLM characteristics rather than methodological failures. For HCI researchers, this presents a research opportunity to support practitioners in systematizing emerging practices rather than developing new evaluation frameworks.

HCNov 15, 2023
Exploring Links between Conversational Agent Design Challenges and Interdisciplinary Collaboration

Malak Sadek, Céline Mougenot

Recent years have seen a steady rise in the popularity and use of Conversational Agents (CA) for different applications, well before the more immediate impact of large language models. This rise has been accompanied by an extensive exploration and documentation of the challenges of designing and creating conversational agents. Focusing on a recent scoping review of the socio-technical challenges of CA creation, this opinion paper calls for an examination of the extent to which interdisciplinary collaboration (IDC) challenges might contribute towards socio-technical CA design challenges. The paper proposes a taxonomy of CA design challenges using IDC as a lens, and proposes practical strategies to overcome them which complement existing design principles. The paper invites future work to empirically verify suggested conceptual links and apply the proposed strategies within the space of CA design to evaluate their effectiveness.

HCJun 24, 2024
Modulating Language Model Experiences through Frictions

Katherine M. Collins, Valerie Chen, Ilia Sucholutsky et al.

Language models are transforming the ways that their users engage with the world. Despite impressive capabilities, over-consumption of language model outputs risks propagating unchecked errors in the short-term and damaging human capabilities for critical thinking in the long-term. How can we develop scaffolding around language models to curate more appropriate use? We propose selective frictions for language model experiences, inspired by behavioral science interventions, to dampen misuse. Frictions involve small modifications to a user's experience, e.g., the addition of a button impeding model access and reminding a user of their expertise relative to the model. Through a user study with real humans, we observe shifts in user behavior from the imposition of a friction over LLMs in the context of a multi-topic question-answering task as a representative task that people may use LLMs for, e.g., in education and information retrieval. We find that frictions modulate over-reliance by driving down users' click rates while minimally affecting accuracy for those topics. Yet, frictions may have unintended effects. We find marked differences in users' click behaviors even on topics where frictions were not provisioned. Our contributions motivate further study of human-AI behavioral interaction to inform more effective and appropriate LLM use.