Karen Levy

CY
h-index37
16papers
1,347citations
Novelty26%
AI Score47

16 Papers

AIOct 27, 2022
Gathering Strength, Gathering Storms: The One Hundred Year Study on Artificial Intelligence (AI100) 2021 Study Panel Report

Michael L. Littman, Ifeoma Ajunwa, Guy Berger et al.

In September 2021, the "One Hundred Year Study on Artificial Intelligence" project (AI100) issued the second report of its planned long-term periodic assessment of artificial intelligence (AI) and its impact on society. It was written by a panel of 17 study authors, each of whom is deeply rooted in AI research, chaired by Michael Littman of Brown University. The report, entitled "Gathering Strength, Gathering Storms," answers a set of 14 questions probing critical areas of AI development addressing the major risks and dangers of AI, its effects on society, its public perception and the future of the field. The report concludes that AI has made a major leap from the lab to people's lives in recent years, which increases the urgency to understand its potential negative effects. The questions were developed by the AI100 Standing Committee, chaired by Peter Stone of the University of Texas at Austin, consisting of a group of AI leaders with expertise in computer science, sociology, ethics, economics, and other disciplines.

CYMar 8, 2022
An Uncommon Task: Participatory Design in Legal AI

Fernando Delgado, Solon Barocas, Karen Levy

Despite growing calls for participation in AI design, there are to date few empirical studies of what these processes look like and how they can be structured for meaningful engagement with domain experts. In this paper, we examine a notable yet understudied AI design process in the legal domain that took place over a decade ago, the impact of which still informs legal automation efforts today. Specifically, we examine the design and evaluation activities that took place from 2006 to 2011 within the TeXT Retrieval Conference's (TREC) Legal Track, a computational research venue hosted by the National Institute of Standards and Technologies. The Legal Track of TREC is notable in the history of AI research and practice because it relied on a range of participatory approaches to facilitate the design and evaluation of new computational techniques--in this case, for automating attorney document review for civil litigation matters. Drawing on archival research and interviews with coordinators of the Legal Track of TREC, our analysis reveals how an interactive simulation methodology allowed computer scientists and lawyers to become co-designers and helped bridge the chasm between computational research and real-world, high-stakes litigation practice. In analyzing this case from the recent past, our aim is to empirically ground contemporary critiques of AI development and evaluation and the calls for greater participation as a means to address them.

GTJan 28, 2023
Informational Diversity and Affinity Bias in Team Growth Dynamics

Hoda Heidari, Solon Barocas, Jon Kleinberg et al.

Prior work has provided strong evidence that, within organizational settings, teams that bring a diversity of information and perspectives to a task are more effective than teams that do not. If this form of informational diversity confers performance advantages, why do we often see largely homogeneous teams in practice? One canonical argument is that the benefits of informational diversity are in tension with affinity bias. To better understand the impact of this tension on the makeup of teams, we analyze a sequential model of team formation in which individuals care about their team's performance (captured in terms of accurately predicting some future outcome based on a set of features) but experience a cost as a result of interacting with teammates who use different approaches to the prediction task. Our analysis of this simple model reveals a set of subtle behaviors that team-growth dynamics can exhibit: (i) from certain initial team compositions, they can make progress toward better performance but then get stuck partway to optimally diverse teams; while (ii) from other initial compositions, they can also move away from this optimal balance as the majority group tries to crowd out the opinions of the minority. The initial composition of the team can determine whether the dynamics will move toward or away from performance optimality, painting a path-dependent picture of inefficiencies in team compositions. Our results formalize a fundamental limitation of utility-based motivations to drive informational diversity in organizations and hint at interventions that may improve informational diversity and performance simultaneously.

CYNov 29, 2022
Proactive Moderation of Online Discussions: Existing Practices and the Potential for Algorithmic Support

Charlotte Schluger, Jonathan P. Chang, Cristian Danescu-Niculescu-Mizil et al.

To address the widespread problem of uncivil behavior, many online discussion platforms employ human moderators to take action against objectionable content, such as removing it or placing sanctions on its authors. This reactive paradigm of taking action against already-posted antisocial content is currently the most common form of moderation, and has accordingly underpinned many recent efforts at introducing automation into the moderation process. Comparatively less work has been done to understand other moderation paradigms -- such as proactively discouraging the emergence of antisocial behavior rather than reacting to it -- and the role algorithmic support can play in these paradigms. In this work, we investigate such a proactive framework for moderation in a case study of a collaborative setting: Wikipedia Talk Pages. We employ a mixed methods approach, combining qualitative and design components for a holistic analysis. Through interviews with moderators, we find that despite a lack of technical and social support, moderators already engage in a number of proactive moderation behaviors, such as preemptively intervening in conversations to keep them on track. Further, we explore how automation could assist with this existing proactive moderation workflow by building a prototype tool, presenting it to moderators, and examining how the assistance it provides might fit into their workflow. The resulting feedback uncovers both strengths and drawbacks of the prototype tool and suggests concrete steps towards further developing such assisting technology so it can most effectively support moderators in their existing proactive moderation workflow.

LGSep 8, 2023
On the Actionability of Outcome Prediction

Lydia T. Liu, Solon Barocas, Jon Kleinberg et al.

Predicting future outcomes is a prevalent application of machine learning in social impact domains. Examples range from predicting student success in education to predicting disease risk in healthcare. Practitioners recognize that the ultimate goal is not just to predict but to act effectively. Increasing evidence suggests that relying on outcome predictions for downstream interventions may not have desired results. In most domains there exists a multitude of possible interventions for each individual, making the challenge of taking effective action more acute. Even when causal mechanisms connecting the individual's latent states to outcomes is well understood, in any given instance (a specific student or patient), practitioners still need to infer -- from budgeted measurements of latent states -- which of many possible interventions will be most effective for this individual. With this in mind, we ask: when are accurate predictors of outcomes helpful for identifying the most suitable intervention? Through a simple model encompassing actions, latent states, and measurements, we demonstrate that pure outcome prediction rarely results in the most effective policy for taking actions, even when combined with other measurements. We find that except in cases where there is a single decisive action for improving the outcome, outcome prediction never maximizes "action value", the utility of taking actions. Making measurements of actionable latent states, where specific actions lead to desired outcomes, considerably enhances the action value compared to outcome prediction, and the degree of improvement depends on action costs and the outcome model. This analysis emphasizes the need to go beyond generic outcome prediction in interventional settings by incorporating knowledge of plausible actions and latent states.

CYOct 5, 2023
Strategic Evaluation: Subjects, Evaluators, and Society

Benjamin Laufer, Jon Kleinberg, Karen Levy et al.

A broad current application of algorithms is in formal and quantitative measures of murky concepts -- like merit -- to make decisions. When people strategically respond to these sorts of evaluations in order to gain favorable decision outcomes, their behavior can be subjected to moral judgments. They may be described as 'gaming the system' or 'cheating,' or (in other cases) investing 'honest effort' or 'improving.' Machine learning literature on strategic behavior has tried to describe these dynamics by emphasizing the efforts expended by decision subjects hoping to obtain a more favorable assessment -- some works offer ways to preempt or prevent such manipulations, some differentiate 'gaming' from 'improvement' behavior, while others aim to measure the effort burden or disparate effects of classification systems. We begin from a different starting point: that the design of an evaluation itself can be understood as furthering goals held by the evaluator which may be misaligned with broader societal goals. To develop the idea that evaluation represents a strategic interaction in which both the evaluator and the subject of their evaluation are operating out of self-interest, we put forward a model that represents the process of evaluation using three interacting agents: a decision subject, an evaluator, and society, representing a bundle of values and oversight mechanisms. We highlight our model's applicability to a number of social systems where one or two players strategically undermine the others' interests to advance their own. Treating evaluators as themselves strategic allows us to re-cast the scrutiny directed at decision subjects, towards the incentives that underpin institutional designs of evaluations. The moral standing of strategic behaviors often depend on the moral standing of the evaluations and incentives that provoke such behaviors.

46.7CYMay 15
Inside Baseball: The Automated Ball-Strike System as an Object Lesson in Technological Rule Enforcement

Andrea Wen-Yi Wang, Waki Kamino, David Mimno et al.

Clearly-defined rules are often assumed to be straightforward to automate and evaluate. We challenge this assumption through an in-depth study of Major League Baseball's (MLB) seven-year experimentation with the Automated Ball-Strike System (ABS). ABS is envisioned to call balls and strikes accurately: a seemingly straightforward use of technology to objectively determine the distance between a pitch and the strike zone. Although the strike zone is an area clearly defined in the rulebook, it took MLB seven years to figure out how to automate calling balls and strikes with ABS, showing how even seemingly straightforward rules require a complex translation process to operationalize via technological systems. In this paper, we trace the design decisions that led to the current implementation of ABS. Our case study reveals that "distance" exists even between a clear rule and its technological implementation. Using analytic frameworks from Science and Technology Studies (STS), we show that such distance exists because (1) historically, the "ground truth" of the strike zone is contested: the rule in practice has always reflected a hybrid between the rulebook definition and umpires' enforcement decisions; and (2) the use of ABS is embedded in an existing eco-system, where the implementation of a technological enforcement system needs to balance multiple stakeholder values. This perspective challenges conventional evaluation paradigms that center on the distance between a formalized rule and its technological implementation, and instead calls for evaluating how such systems are experienced in practice. Addressing this question requires in-depth social science approaches, contributing to ongoing conversations in FAccT about the implementation and evaluation of sociotechnical systems.

HCFeb 5
Beyond Community Notes: A Framework for Understanding and Building Crowdsourced Context Systems for Social Media

Travis Lloyd, Tung Nguyen, Karen Levy et al.

Social media platforms are increasingly adopting features that display crowdsourced context alongside posts, a technique pioneered by X's Community Notes. These systems -- which we term Crowdsourced Context Systems (CCS) -- have the potential to reshape the information ecosystem as major platforms embrace them as alternatives to professional fact-checking. To understand the features and implications of these systems, we conduct a systematic literature review of existing CCS research (n=56) and analyze real-world CCS implementations. Based on our analysis, we develop a framework with two components. First, we present a theoretical model to conceptualize and define CCS. Second, we identify a design space encompassing six aspects: participation, inputs, curation, presentation, platform treatment, and transparency. We also surface normative implications of different CCS design and implementation choices. Our work integrates theoretical, design, and ethical perspectives to establish a foundation for future human-centered research on Crowdsourced Context Systems.

42.4CYMay 4
A Critical Pragmatism Approach for Algorithmic Fairness: Lessons from Urban Planning Theory

Jennah Gosciak, Karen Levy, Allison Koenecke

As data scientists grapple with increasingly complex ethical decisions in machine learning (ML) and data science, the field of algorithmic fairness has offered multiple solutions, from formal mathematical definitions to holistic notions of fairness drawn from various academic disciplines. However, navigating and implementing these fairness approaches in practice remains an ongoing challenge. In this paper, we draw a parallel between the types of problems arising in algorithmic fairness and urban planning. We frame algorithmic fairness problems as `wicked problems,' a term originating from the planning and policy space to describe the intractable, value-laden, and complex nature of this work. As such, we argue that the field of algorithmic fairness can learn from theoretical work in urban planning in ameliorating its own set of wicked problems. Urban planning is typically concerned with practical issues of governance, resource allocation, stakeholder engagement, and conflicts involving deep-seated differences. These are challenges that existing fairness frameworks can easily overlook. We present a flexible framework for designing fairer algorithms based on the urban planning theory approach of critical pragmatism -- a reflective and deliberative approach to addressing wicked problems that considers what practitioners actually do in the face of conflict and power. We provide specific recommendations and apply them to several case studies in ML and algorithm design: automated mortgage lending, school choice, and feminicide counterdata collection. Researchers and practitioners can incorporate these recommendations derived from urban planning into their ongoing work to more holistically address practical problems arising in fair algorithm design.

GTJun 3, 2025
Designing Algorithmic Delegates: The Role of Indistinguishability in Human-AI Handoff

Sophie Greenwood, Karen Levy, Solon Barocas et al.

As AI technologies improve, people are increasingly willing to delegate tasks to AI agents. In many cases, the human decision-maker chooses whether to delegate to an AI agent based on properties of the specific instance of the decision-making problem they are facing. Since humans typically lack full awareness of all the factors relevant to this choice for a given decision-making instance, they perform a kind of categorization by treating indistinguishable instances -- those that have the same observable features -- as the same. In this paper, we define the problem of designing the optimal algorithmic delegate in the presence of categories. This is an important dimension in the design of algorithms to work with humans, since we show that the optimal delegate can be an arbitrarily better teammate than the optimal standalone algorithmic agent. The solution to this optimal delegation problem is not obvious: we discover that this problem is fundamentally combinatorial, and illustrate the complex relationship between the optimal design and the properties of the decision-making task even in simple settings. Indeed, we show that finding the optimal delegate is computationally hard in general. However, we are able to find efficient algorithms for producing the optimal delegate in several broad cases of the problem, including when the optimal action may be decomposed into functions of features observed by the human and the algorithm. Finally, we run computational experiments to simulate a designer updating an algorithmic delegate over time to be optimized for when it is actually adopted by users, and show that while this process does not recover the optimal delegate in general, the resulting delegate often performs quite well.

CYMay 13, 2025
One Bad NOFO? AI Governance in Federal Grantmaking

Dan Bateyko, Karen Levy

Much scholarship considers how U.S. federal agencies govern artificial intelligence (AI) through rulemaking and their own internal use policies. But agencies have an overlooked AI governance role: setting discretionary grant policy when directing billions of dollars in federal financial assistance. These dollars enable state and local entities to study, create, and use AI. This funding not only goes to dedicated AI programs, but also to grantees using AI in the course of meeting their routine grant objectives. As discretionary grantmakers, agencies guide and restrict what grant winners do -- a hidden lever for AI governance. Agencies pull this lever by setting program objectives, judging criteria, and restrictions for AI use. Using a novel dataset of over 40,000 non-defense federal grant notices of funding opportunity (NOFOs) posted to the U.S. federal grants website between 2009 and 2024, we analyze how agencies regulate the use of AI by grantees. We select records mentioning AI and review their stated goals and requirements. We find agencies promoting AI in notice narratives, shaping adoption in ways other records of grant policy might fail to capture. Of the grant opportunities that mention AI, we find only a handful of AI-specific judging criteria or restrictions. This silence holds even when agencies fund AI uses in contexts affecting people's rights and which, under an analogous federal procurement regime, would result in extra oversight. These findings recast grant notices as a site of AI policymaking -- albeit one that is developing out of step with other regulatory efforts and incomplete in its consideration of transparency, accountability, and privacy protections. The paper concludes by drawing lessons from AI procurement scholarship, while identifying distinct challenges in grantmaking that invite further study.

CYMay 26, 2021
Computer Vision and Conflicting Values: Describing People with Automated Alt Text

Margot Hanley, Solon Barocas, Karen Levy et al.

Scholars have recently drawn attention to a range of controversial issues posed by the use of computer vision for automatically generating descriptions of people in images. Despite these concerns, automated image description has become an important tool to ensure equitable access to information for blind and low vision people. In this paper, we investigate the ethical dilemmas faced by companies that have adopted the use of computer vision for producing alt text: textual descriptions of images for blind and low vision people, We use Facebook's automatic alt text tool as our primary case study. First, we analyze the policies that Facebook has adopted with respect to identity categories, such as race, gender, age, etc., and the company's decisions about whether to present these terms in alt text. We then describe an alternative -- and manual -- approach practiced in the museum community, focusing on how museums determine what to include in alt text descriptions of cultural artifacts. We compare these policies, using notable points of contrast to develop an analytic framework that characterizes the particular apprehensions behind these policy choices. We conclude by considering two strategies that seem to sidestep some of these concerns, finding that there are no easy ways to avoid the normative dilemmas posed by the use of computer vision to automate alt text.

HCFeb 10, 2021
Artificial intelligence in communication impacts language and social relationships

Jess Hohenstein, Dominic DiFranzo, Rene F. Kizilcec et al.

Artificial intelligence (AI) is now widely used to facilitate social interaction, but its impact on social relationships and communication is not well understood. We study the social consequences of one of the most pervasive AI applications: algorithmic response suggestions ("smart replies"). Two randomized experiments (n = 1036) provide evidence that a commercially-deployed AI changes how people interact with and perceive one another in pro-social and anti-social ways. We find that using algorithmic responses increases communication efficiency, use of positive emotional language, and positive evaluations by communication partners. However, consistent with common assumptions about the negative implications of AI, people are evaluated more negatively if they are suspected to be using algorithmic responses. Thus, even though AI can increase communication efficiency and improve interpersonal perceptions, it risks changing users' language production and continues to be viewed negatively.

CYJul 4, 2020
Accuracy-Efficiency Trade-Offs and Accountability in Distributed ML Systems

A. Feder Cooper, Karen Levy, Christopher De Sa

Trade-offs between accuracy and efficiency pervade law, public health, and other non-computing domains, which have developed policies to guide how to balance the two in conditions of uncertainty. While computer science also commonly studies accuracy-efficiency trade-offs, their policy implications remain poorly examined. Drawing on risk assessment practices in the US, we argue that, since examining these trade-offs has been useful for guiding governance in other domains, we need to similarly reckon with these trade-offs in governing computer systems. We focus our analysis on distributed machine learning systems. Understanding the policy implications in this area is particularly urgent because such systems, which include autonomous vehicles, tend to be high-stakes and safety-critical. We 1) describe how the trade-off takes shape for these systems, 2) highlight gaps between existing US risk assessment standards and what these systems require to be properly assessed, and 3) make specific calls to action to facilitate accountability when hypothetical risks concerning the accuracy-efficiency trade-off become realized as accidents in the real world. We close by discussing how such accountability mechanisms encourage more just, transparent governance aligned with public values.

CYJun 21, 2019
Mitigating Bias in Algorithmic Hiring: Evaluating Claims and Practices

Manish Raghavan, Solon Barocas, Jon Kleinberg et al.

There has been rapidly growing interest in the use of algorithms in hiring, especially as a means to address or mitigate bias. Yet, to date, little is known about how these methods are used in practice. How are algorithmic assessments built, validated, and examined for bias? In this work, we document and analyze the claims and practices of companies offering algorithms for employment assessment. In particular, we identify vendors of algorithmic pre-employment assessments (i.e., algorithms to screen candidates), document what they have disclosed about their development and validation procedures, and evaluate their practices, focusing particularly on efforts to detect and mitigate bias. Our analysis considers both technical and legal perspectives. Technically, we consider the various choices vendors make regarding data collection and prediction targets, and explore the risks and trade-offs that these choices pose. We also discuss how algorithmic de-biasing techniques interface with, and create challenges for, antidiscrimination law.

CYSep 5, 2018
Debiasing Desire: Addressing Bias & Discrimination on Intimate Platforms

Jevan Hutson, Jessie G. Taft, Solon Barocas et al.

Designing technical systems to be resistant to bias and discrimination represents vital new terrain for researchers, policymakers, and the anti-discrimination project more broadly. We consider bias and discrimination in the context of popular online dating and hookup platforms in the United States, which we call intimate platforms. Drawing on work in social-justice-oriented and Queer HCI, we review design features of popular intimate platforms and their potential role in exacerbating or mitigating interpersonal bias. We argue that focusing on platform design can reveal opportunities to reshape troubling patterns of intimate contact without overriding users' decisional autonomy. We identify and address the difficult ethical questions that nevertheless come along with such intervention, while urging the social computing community to engage more deeply with issues of bias, discrimination, and exclusion in the study and design of intimate platforms.