Margaret Burnett

HC
h-index2
13papers
227citations
Novelty36%
AI Score42

13 Papers

HCAug 23, 2024
How to Measure Human-AI Prediction Accuracy in Explainable AI Systems

Sujay Koujalgi, Andrew Anderson, Iyadunni Adenuga et al.

Assessing an AI system's behavior-particularly in Explainable AI Systems-is sometimes done empirically, by measuring people's abilities to predict the agent's next move-but how to perform such measurements? In empirical studies with humans, an obvious approach is to frame the task as binary (i.e., prediction is either right or wrong), but this does not scale. As output spaces increase, so do floor effects, because the ratio of right answers to wrong answers quickly becomes very small. The crux of the problem is that the binary framing is failing to capture the nuances of the different degrees of "wrongness." To address this, we begin by proposing three mathematical bases upon which to measure "partial wrongness." We then uses these bases to perform two analyses on sequential decision-making domains: the first is an in-lab study with 86 participants on a size-36 action space; the second is a re-analysis of a prior study on a size-4 action space. Other researchers adopting our operationalization of the prediction task and analysis methodology will improve the rigor of user studies conducted with that task, which is particularly important when the domain features a large output space.

HCApr 10
Thinking Less, Trusting More: GenAI's Impacts on Students' Cognitive Habits

Rudrajit Choudhuri, Christopher Sanchez, Margaret Burnett et al.

Objectives: When students use generative AI in coursework, what are its persistent effects on their intellectual development? We investigate (RQ1-How) how students' trust in and routine use of genAI affect their cognitive engagement habits in STEM coursework, and (RQ2-Who) which students are particularly vulnerable to cognitive disengagement. Method: Drawing on dual-process, cognitive offloading, and automation bias theories, we developed a statistical model explaining how and to what extent students' trust-driven routine genAI use affected their cognitive engagement -- specifically, reflection, the need for understanding, and critical thinking in coursework, and how these effects differed across students' cognitive styles. We empirically evaluated this model using Partial Least Squares Structural Equation Modeling on survey data from 299 STEM students across five North American universities. Results: Students who trusted and routinely used genAI reported significantly lower cognitive engagement. Unexpectedly, students with higher technophilic motivations, risk tolerance, and computer self-efficacy -- traits often celebrated in STEM -- were more prone to these effects. Interestingly, students' prior experience with genAI or academia did not protect them from cognitively disengaging. Implications: Our findings suggest a potential cognitive debt cycle where routine genAI use weakens students' intellectual habits, potentially driving and escalating over-reliance. This poses challenges for curricula and genAI system design, requiring interventions that actively support cognitive engagement.

SEFeb 27, 2022Code
How to Debug Inclusivity Bugs? A Debugging Process with Information Architecture

Mariam Guizani, Igor Steinmacher, Jillian Emard et al.

Although some previous research has found ways to find inclusivity bugs (biases in software that introduce inequities), little attention has been paid to how to go about fixing such bugs. Without a process to move from finding to fixing, acting upon such findings is an ad-hoc activity, at the mercy of the skills of each individual developer. To address this gap, we created Why/Where/Fix, a systematic inclusivity debugging process whose inclusivity fault localization harnesses Information Architecture(IA) -- the way user-facing information is organized, structured and labeled. We then conducted a multi-stage qualitative empirical evaluation of the effectiveness of Why/Where/Fix, using an Open Source Software (OSS) project's infrastructure as our setting. In our study, the OSS project team used the Why/Where/Fix process to find inclusivity bugs, localize the IA faults behind them, and then fix the IA to remove the inclusivity bugs they had found. Our results showed that using Why/Where/Fix reduced the number of inclusivity bugs that OSS newcomer participants experienced by 90%.

HCOct 21, 2025
"Over-the-Hood" AI Inclusivity Bugs and How 3 AI Product Teams Found and Fixed Them

Andrew Anderson, Fatima A. Moussaoui, Jimena Noa Guevara et al.

While much research has shown the presence of AI's "under-the-hood" biases (e.g., algorithmic, training data, etc.), what about "over-the-hood" inclusivity biases: barriers in user-facing AI products that disproportionately exclude users with certain problem-solving approaches? Recent research has begun to report the existence of such biases -- but what do they look like, how prevalent are they, and how can developers find and fix them? To find out, we conducted a field study with 3 AI product teams, to investigate what kinds of AI inclusivity bugs exist uniquely in user-facing AI products, and whether/how AI product teams might harness an existing (non-AI-oriented) inclusive design method to find and fix them. The teams' work resulted in identifying 6 types of AI inclusivity bugs arising 83 times, fixes covering 47 of these bug instances, and a new variation of the GenderMag inclusive design method, GenderMag-for-AI, that is especially effective at detecting certain kinds of AI inclusivity bugs.

HCJan 25, 2022
Intersectionality Goes Analytical: Taming Combinatorial Explosion Through Type Abstraction

Margaret Burnett, Martin Erwig, Abrar Fallatah et al.

HCI researchers' and practitioners' awareness of intersectionality has been expanding, producing knowledge, recommendations, and prototypes for supporting intersectional populations. However, doing intersectional HCI work is uniquely expensive: it leads to a combinatorial explosion of empirical work (expense 1), and little of the work on one intersectional population can be leveraged to serve another (expense 2). In this paper, we explain how representations employed by certain analytical design methods correspond to type abstractions, and use that correspondence to identify a (de)compositional model in which a population's diverse identity properties can be joined and split. We formally prove the model's correctness, and show how it enables HCI designers to harness existing analytical HCI methods for use on new intersectional populations of interest. We illustrate through four design use-cases, how the model can reduce the amount of expense 1 and enable designers to leverage prior work to new intersectional populations, addressing expense 2.

HCAug 30, 2021
Toward an Actionable Socioeconomic-Aware HCI

Margaret Burnett, Abrar Fallatah, Catherine Hu et al.

Although inequities for individuals in different socioeconomic situations are starting to capture widespread attention, less attention has been given to the socioeconomic inequities that saturate socioeconomic-diverse individuals' user experiences. To enable HCI practitioners to attend to such inequities and avoid unwittingly introducing them, in this paper we consider a wide body of research relevant to how an individual's socioeconomic status (SES) can affect their user experiences with technology. We synthesize this foundational research to produce a core set of 6 evidence-based SES "facets" (attribute types and value ranges) that directly relate to user experiences for individuals in different SES strata. We then harness these SES facets to produce actionable paths forward -- including a new structured method we call SocioeconomicMag -- by which HCI researchers and practitioners can bring new socioeconomic-aware practices into their everyday HCI work.

HCAug 2, 2021
Measuring User Experience Inclusivity in Human-AI Interaction via Five User Problem-Solving Styles

Andrew Anderson, Jimena Noa Guevara, Fatima Moussaoui et al.

Motivations: Recent research has emerged on generally how to improve AI product user experiences, but relatively little is known about an AI product's inclusivity. For example, what kinds of users does it support well, and who does it leave out? And what changes in the product would make it more inclusive? Objectives: Our overall objective is to help fill this gap, investigating what kinds of diverse users an AI product leaves out, and how to act upon that knowledge. To bring actionability to our findings, we focus on users' diversity of problem-solving attributes. Thus, our specific objectives were: (1) to reveal whether participants with diverse problem-solving styles were left behind in a set of AI products; and (2) to relate participants' problem-solving diversity to their demographic diversity, specifically, gender and age. Methods: We performed 18 experiments, discarding two that failed manipulation checks. Each experiment was a 2x2 factorial experiment with online participants. Each experiment compared two AI products: one deliberately violating an HAI guideline and the other applying the guideline. For our first objective, we analyzed how much each AI product gained/lost inclusivity compared to its counterpart, where inclusivity was supportiveness to participants with particular problem-solving styles. For our second objective, we analyzed how participants' problem-solving styles aligned with their demographics, namely their genders and ages. Results & Implications: Participants' diverse problem-solving styles revealed six types of inclusivity results: (1) the AI products that followed an HAI guideline were almost always more inclusive across diversity of problem-solving styles than the products that did not follow that guideline-but the "who" that got most of the inclusivity varied widely by guideline and by problem-solving style...

SEMay 27, 2019
Engineering Gender-Inclusivity into Software: Tales from the Trenches

Claudia Hilderbrand, Christopher Perdriau, Lara Letaw et al.

Although the need for gender-inclusivity in software itself is gaining attention among both SE researchers and SE practitioners, and methods have been published to help, little has been reported on how to make such methods work in real-world settings. For example, how do busy software practitioners use such methods in low-cost ways? How do they endeavor to maximize benefits from using them? How do they avoid the controversies that can arise in talking about gender? To find out how teams were handling these and similar questions, we turned to 10 real-world software teams. We present these teams experiences "in the trenches," in the form of 12 practices and 3 potential pitfalls, so as to provide their insights to other real-world software teams trying to engineer gender-inclusivity into their software products.

HCMay 7, 2019
Fixing Inclusivity Bugs for Information Processing Styles and Learning Styles

Zoe Steine-Hanson, Claudia Hilderbrand, Lara Letaw et al.

Most software systems today do not support cognitive diversity. Further, because of differences in problem-solving styles that cluster by gender, software that poorly supports cognitive diversity can also embed gender biases. To help software professionals fix gender bias "bugs" related to people's problem-solving styles for information processing and learning of new software we collected inclusivity fixes from three sources. The first two are empirical studies we conducted: a heuristics-driven user study and a field research industry study. The third is data that we obtained about a before/after user study of inclusivity bugs. The resulting seven potential inclusivity fixes show how to debug software to be more inclusive for diverse problem-solving styles.

HCMay 7, 2019
From GenderMag to InclusiveMag: An Inclusive Design Meta-Method

Christopher Mendez, Lara Letaw, Margaret Burnett et al.

How can software practitioners assess whether their software supports diverse users? Although there are empirical processes that can be used to find "inclusivity bugs" piecemeal, what is often needed is a systematic inspection method to assess soft-ware's support for diverse populations. To help fill this gap, this paper introduces InclusiveMag, a generalization of GenderMag that can be used to generate systematic inclusiveness methods for a particular dimension of diversity. We then present a multi-case study covering eight diversity dimensions, of eight teams' experiences applying InclusiveMag to eight under-served populations and their "mainstream" counterparts.

HCMar 22, 2019
Explaining Reinforcement Learning to Mere Mortals: An Empirical Study

Andrew Anderson, Jonathan Dodge, Amrita Sadarangani et al.

We present a user study to investigate the impact of explanations on non-experts' understanding of reinforcement learning (RL) agents. We investigate both a common RL visualization, saliency maps (the focus of attention), and a more recent explanation type, reward-decomposition bars (predictions of future types of rewards). We designed a 124 participant, four-treatment experiment to compare participants' mental models of an RL agent in a simple Real-Time Strategy (RTS) game. Our results show that the combination of both saliency and reward bars were needed to achieve a statistically significant improvement in mental model score over the control. In addition, our qualitative analysis of the data reveals a number of effects for further study.

HCNov 21, 2017
Toward Foraging for Understanding of StarCraft Agents: An Empirical Study

Sean Penney, Jonathan Dodge, Claudia Hilderbrand et al.

Assessing and understanding intelligent agents is a difficult task for users that lack an AI background. A relatively new area, called "Explainable AI," is emerging to help address this problem, but little is known about how users would forage through information an explanation system might offer. To inform the development of Explainable AI systems, we conducted a formative study, using the lens of Information Foraging Theory, into how experienced users foraged in the domain of StarCraft to assess an agent. Our results showed that participants faced difficult foraging problems. These foraging problems caused participants to entirely miss events that were important to them, reluctantly choose to ignore actions they did not want to ignore, and bear high cognitive, navigation, and information costs to access the information they needed.

HCNov 19, 2017
How the Experts Do It: Assessing and Explaining Agent Behaviors in Real-Time Strategy Games

Jonathan Dodge, Sean Penney, Claudia Hilderbrand et al.

How should an AI-based explanation system explain an agent's complex behavior to ordinary end users who have no background in AI? Answering this question is an active research area, for if an AI-based explanation system could effectively explain intelligent agents' behavior, it could enable the end users to understand, assess, and appropriately trust (or distrust) the agents attempting to help them. To provide insights into this question, we turned to human expert explainers in the real-time strategy domain, "shoutcaster", to understand (1) how they foraged in an evolving strategy game in real time, (2) how they assessed the players' behaviors, and (3) how they constructed pertinent and timely explanations out of their insights and delivered them to their audience. The results provided insights into shoutcasters' foraging strategies for gleaning information necessary to assess and explain the players; a characterization of the types of implicit questions shoutcasters answered; and implications for creating explanations by using the patterns