LGMar 29, 2023Code
Fairlearn: Assessing and Improving Fairness of AI SystemsHilde Weerts, Miroslav Dudík, Richard Edgar et al. · microsoft-research
Fairlearn is an open source project to help practitioners assess and improve fairness of artificial intelligence (AI) systems. The associated Python library, also named fairlearn, supports evaluation of a model's output across affected populations and includes several algorithms for mitigating fairness issues. Grounded in the understanding that fairness is a sociotechnical challenge, the project integrates learning resources that aid practitioners in considering a system's broader societal context.
HCJun 10, 2023
Investigating Practices and Opportunities for Cross-functional Collaboration around AI Fairness in Industry PracticeWesley Hanwen Deng, Nur Yildirim, Monica Chang et al. · cmu
An emerging body of research indicates that ineffective cross-functional collaboration -- the interdisciplinary work done by industry practitioners across roles -- represents a major barrier to addressing issues of fairness in AI design and development. In this research, we sought to better understand practitioners' current practices and tactics to enact cross-functional collaboration for AI fairness, in order to identify opportunities to support more effective collaboration. We conducted a series of interviews and design workshops with 23 industry practitioners spanning various roles from 17 companies. We found that practitioners engaged in bridging work to overcome frictions in understanding, contextualization, and evaluation around AI fairness across roles. In addition, in organizational contexts with a lack of resources and incentives for fairness work, practitioners often piggybacked on existing requirements (e.g., for privacy assessments) and AI development norms (e.g., the use of quantitative evaluation metrics), although they worry that these tactics may be fundamentally compromised. Finally, we draw attention to the invisible labor that practitioners take on as part of this bridging and piggybacking work to enact interdisciplinary collaboration for fairness. We close by discussing opportunities for both FAccT researchers and AI practitioners to better support cross-functional collaboration for fairness in the design and development of AI systems.
HCFeb 16, 2023
Human-Centered Responsible Artificial Intelligence: Current & Future TrendsMohammad Tahaei, Marios Constantinides, Daniele Quercia et al.
In recent years, the CHI community has seen significant growth in research on Human-Centered Responsible Artificial Intelligence. While different research communities may use different terminology to discuss similar topics, all of this work is ultimately aimed at developing AI that benefits humanity while being grounded in human rights and ethics, and reducing the potential harms of AI. In this special interest group, we aim to bring together researchers from academia and industry interested in these topics to map current and future research trends to advance this important area of research by fostering collaboration and sharing ideas.
LGJul 5, 2023
Scaling Laws Do Not ScaleFernando Diaz, Michael Madaio
Recent work has advocated for training AI models on ever-larger datasets, arguing that as the size of a dataset increases, the performance of a model trained on that dataset will correspondingly increase (referred to as "scaling laws"). In this paper, we draw on literature from the social sciences and machine learning to critically interrogate these claims. We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output. As the size of datasets used to train large AI models grows and AI systems impact ever larger groups of people, the number of distinct communities represented in training or evaluation datasets grows. It is thus even more likely that communities represented in datasets may have values or preferences not reflected in (or at odds with) the metrics used to evaluate model performance in scaling laws. Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations -- threatening the validity of claims that model performance is improving at scale. We end the paper with implications for AI development: that the motivation for scraping ever-larger datasets may be based on fundamentally flawed assumptions about model performance. That is, models may not, in fact, continue to improve as the datasets get larger -- at least not for all people or communities impacted by those models. We suggest opportunities for the field to rethink norms and values in AI development, resisting claims for universality of large models, fostering more local, small-scale designs, and other ways to resist the impetus towards scale in AI.
HCOct 2, 2023
The Participatory Turn in AI Design: Theoretical Foundations and the Current State of PracticeFernando Delgado, Stephen Yang, Michael Madaio et al.
Despite the growing consensus that stakeholders affected by AI systems should participate in their design, enormous variation and implicit disagreements exist among current approaches. For researchers and practitioners who are interested in taking a participatory approach to AI design and development, it remains challenging to assess the extent to which any participatory approach grants substantive agency to stakeholders. This article thus aims to ground what we dub the "participatory turn" in AI design by synthesizing existing theoretical literature on participation and through empirical investigation and critique of its current practices. Specifically, we derive a conceptual framework through synthesis of literature across technology design, political theory, and the social sciences that researchers and practitioners can leverage to evaluate approaches to participation in AI design. Additionally, we articulate empirical findings concerning the current state of participatory practice in AI design based on an analysis of recently published research and semi-structured interviews with 12 AI researchers and practitioners. We use these empirical findings to understand the current state of participatory practice and subsequently provide guidance to better align participatory goals and methods in a way that accounts for practical constraints.
HCFeb 10Code
Navigating Uncertainties: How GenAI Developers Document Their Models on Open-Source PlatformsNingjing Tang, Megan Li, Amy Winecoff et al.
Model documentation plays a crucial role in promoting transparency and responsible development of AI systems. With the rise of Generative AI (GenAI), open-source platforms have increasingly become hubs for hosting and distributing these models, prompting platforms like Hugging Face to develop dedicated model documentation guidelines that align with responsible AI principles. Despite these growing efforts, there remains a lack of understanding of how developers document their GenAI models on open-source platforms. Through interviews with 13 GenAI developers active on open-source platforms, we provide empirical insights into their documentation practices and challenges. Our analysis reveals that despite existing resources, developers of GenAI models still face multiple layers of uncertainties in their model documentation: (1) uncertainties about what specific content should be included; (2) uncertainties about how to effectively report key components of their models; and (3) uncertainties in deciding who should take responsibilities for various aspects of model documentation. Based on our findings, we discuss the implications for policymakers, open-source platforms, and the research community to support meaningful, effective and actionable model documentation in the GenAI era, including cultivating better community norms, building robust evaluation infrastructures, and clarifying roles and responsibilities.
8.7CYApr 8
Google, AI Literacy, and the Learning Sciences: Multiple Modes of Research, Industry, and Practice PartnershipsVictor R. Lee, Michael Madaio, Ben Garside et al.
Enabling AI literacy in the general population at scale is a complex challenge requiring multiple stakeholders and institutions collaborating together. Industry and technology companies are important actors with respect to AI, and as a field, we have the opportunity to consider how researchers and companies might be partners toward shared goals. In this symposium, we focus on a collection of partnership projects that all involve Google and all address AI literacy as a comparative set of examples. Through a combination of presentations, commentary, and moderated group discussion, the session, we will identify (1) at what points in the life cycle do research, practice, and industry partnerships clearly intersect; (2) what factors and histories shape the directional focus of the partnerships; and (3) where there may be future opportunities for new configurations of partnership that are jointly beneficial to all parties.
HCFeb 23, 2024
Farsight: Fostering Responsible AI Awareness During AI Application PrototypingZijie J. Wang, Chinmay Kulkarni, Lauren Wilcox et al. · gatech
Prompt-based interfaces for Large Language Models (LLMs) have made prototyping and building AI-powered applications easier than ever before. However, identifying potential harms that may arise from AI applications remains a challenge, particularly during prompt-based prototyping. To address this, we present Farsight, a novel in situ interactive tool that helps people identify potential harms from the AI applications they are prototyping. Based on a user's prompt, Farsight highlights news articles about relevant AI incidents and allows users to explore and edit LLM-generated use cases, stakeholders, and harms. We report design insights from a co-design study with 10 AI prototypers and findings from a user study with 42 AI prototypers. After using Farsight, AI prototypers in our user study are better able to independently identify potential harms associated with a prompt and find our tool more useful and usable than existing resources. Their qualitative feedback also highlights that Farsight encourages them to focus on end-users and think beyond immediate harms. We discuss these findings and reflect on their implications for designing AI prototyping experiences that meaningfully engage with AI harms. Farsight is publicly accessible at: https://PAIR-code.github.io/farsight.
CYMar 24, 2025
The Case for "Thick Evaluations" of Cultural Representation in AIRida Qadri, Mark Diaz, Ding Wang et al.
Generative AI image models have been increasingly evaluated for their (in)ability to represent non-Western cultures. We argue that these evaluations operate through reductive ideals of representation, abstracted from how people define their own representation and neglecting the inherently interpretive and contextual nature of cultural representation. In contrast to these 'thin' evaluations, we introduce the idea of 'thick evaluations': a more granular, situated, and discursive measurement framework for evaluating representations of social worlds in AI images, steeped in communities' own understandings of representation. We develop this evaluation framework through workshops in South Asia, by studying the 'thick' ways in which people interpret and assign meaning to images of their own cultures. We introduce practices for thicker evaluations of representation that expand the understanding of representation underpinning AI evaluations and by co-constructing metrics with communities, bringing measurement in line with the experiences of communities on the ground.
HCAug 12, 2025
"Accessibility people, you go work on that thing of yours over there": Addressing Disability Inclusion in AI Product OrganizationsSanika Moharana, Cynthia L. Bennett, Erin Buehler et al.
The rapid emergence of generative AI has changed the way that technology is designed, constructed, maintained, and evaluated. Decisions made when creating AI-powered systems may impact some users disproportionately, such as people with disabilities. In this paper, we report on an interview study with 25 AI practitioners across multiple roles (engineering, research, UX, and responsible AI) about how their work processes and artifacts may impact end users with disabilities. We found that practitioners experienced friction when triaging problems at the intersection of responsible AI and accessibility practices, navigated contradictions between accessibility and responsible AI guidelines, identified gaps in data about users with disabilities, and gathered support for addressing the needs of disabled stakeholders by leveraging informal volunteer and community groups within their company. Based on these findings, we offer suggestions for new resources and process changes to better support people with disabilities as end users of AI.
AIDec 10, 2021
Assessing the Fairness of AI Systems: AI Practitioners' Processes, Challenges, and Needs for SupportMichael Madaio, Lisa Egede, Hariharan Subramonyam et al.
Various tools and practices have been developed to support practitioners in identifying, assessing, and mitigating fairness-related harms caused by AI systems. However, prior research has highlighted gaps between the intended design of these tools and practices and their use within particular contexts, including gaps caused by the role that organizational factors play in shaping fairness work. In this paper, we investigate these gaps for one such practice: disaggregated evaluations of AI systems, intended to uncover performance disparities between demographic groups. By conducting semi-structured interviews and structured workshops with thirty-three AI practitioners from ten teams at three technology companies, we identify practitioners' processes, challenges, and needs for support when designing disaggregated evaluations. We find that practitioners face challenges when choosing performance metrics, identifying the most relevant direct stakeholders and demographic groups on which to focus, and collecting datasets with which to conduct disaggregated evaluations. More generally, we identify impacts on fairness work stemming from a lack of engagement with direct stakeholders or domain experts, business imperatives that prioritize customers over marginalized groups, and the drive to deploy AI systems at scale.
AINov 1, 2021
Stakeholder Participation in AI: Beyond "Add Diverse Stakeholders and Stir"Fernando Delgado, Stephen Yang, Michael Madaio et al.
There is a growing consensus in HCI and AI research that the design of AI systems needs to engage and empower stakeholders who will be affected by AI. However, the manner in which stakeholders should participate in AI design is unclear. This workshop paper aims to ground what we dub a 'participatory turn' in AI design by synthesizing existing literature on participation and through empirical analysis of its current practices via a survey of recent published research and a dozen semi-structured interviews with AI researchers and practitioners. Based on our literature synthesis and empirical research, this paper presents a conceptual framework for analyzing participatory approaches to AI design and articulates a set of empirical findings that in ensemble detail out the contemporary landscape of participatory practice in AI design. These findings can help bootstrap a more principled discussion on how PD of AI should move forward across AI, HCI, and other research communities.
CYOct 19, 2021
Risks of AI Foundation Models in EducationSu Lin Blodgett, Michael Madaio
If the authors of a recent Stanford report (Bommasani et al., 2021) on the opportunities and risks of "foundation models" are to be believed, these models represent a paradigm shift for AI and for the domains in which they will supposedly be used, including education. Although the name is new (and contested (Field, 2021)), the term describes existing types of algorithmic models that are "trained on broad data at scale" and "fine-tuned" (i.e., adapted) for particular downstream tasks, and is intended to encompass large language models such as BERT or GPT-3 and computer vision models such as CLIP. Such technologies have the potential for harm broadly speaking (e.g., Bender et al., 2021), but their use in the educational domain is particularly fraught, despite the potential benefits for learners claimed by the authors. In section 3.3 of the Stanford report, Malik et al. argue that achieving the goal of providing education for all learners requires more efficient computational approaches that can rapidly scale across educational domains and across educational contexts, for which they argue foundation models are uniquely well-suited. However, evidence suggests that not only are foundation models not likely to achieve the stated benefits for learners, but their use may also introduce new risks for harm.
CYMay 18, 2021
Beyond "Fairness:" Structural (In)justice Lenses on AI for EducationMichael Madaio, Su Lin Blodgett, Elijah Mayfield et al.
Educational technologies, and the systems of schooling in which they are deployed, enact particular ideologies about what is important to know and how learners should learn. As artificial intelligence technologies -- in education and beyond -- may contribute to inequitable outcomes for marginalized communities, various approaches have been developed to evaluate and mitigate the harmful impacts of AI. However, we argue in this paper that the dominant paradigm of evaluating fairness on the basis of performance disparities in AI models is inadequate for confronting the systemic inequities that educational AI systems (re)produce. We draw on a lens of structural injustice informed by critical theory and Black feminist scholarship to critically interrogate several widely-studied and widely-adopted categories of educational AI and explore how they are bound up in and reproduce historical legacies of structural injustice and inequity, regardless of the parity of their models' performance. We close with alternative visions for a more equitable future for educational AI.