Jennah Gosciak

CY
3papers
1citation
Novelty25%
AI Score37

3 Papers

57.1CYMay 17
Scrutinizing Index-Based Risk Assessments: A Case Study in NYC Decision-making for Heat Emergency Management

Jennah Gosciak, Luke Boyce, Angelina Wang et al.

Cities are increasingly turning to large-scale data analysis and machine learning to make consequential decisions. While the algorithmic fairness community has focused on analyzing the risks and benefits associated with these complex methods, there has been much less scrutiny of the many simpler, but still widely used, data-driven tools that support government decision-making in a variety of settings. In this work, we study hand-crafted indices for geographic targeting and decision-making in emergency management -- a field responsible for coordinating preparedness and response efforts to hazards ranging from natural disasters to human threats. Indices, which capture abstract principles and overarching priorities (e.g., reducing social vulnerability), are low-complexity models that statistically aggregate chosen variables. They are generally flexible and interpretable, but can also be sensitive to key design choices and require strong assumptions. Through a case study of decision-making for extreme heat emergencies in NYC, we examine the challenges that practitioners may face in selecting an index for preparedness and response actions. We map empirical findings from index-based simulations to concerns related to validity and reliability from the measurement literature and show via sensitivity analyses that different reasonable choices of input variables or spatial scale can result in substantive differences to index risk scores, thereby affecting downstream government decision-making. We contrast these challenges with considerations for developing predictive algorithms that more narrowly relate to concrete, measurable outcomes. Ultimately, we provide generalizable recommendations that practitioners and public-sector technologists can use for navigating the trade-offs between indices and predictive algorithms in other government settings.

83.7HCMar 11
LLMs in social services: How does chatbot accuracy affect human accuracy?

Jennah Gosciak, Eric Giannella, Zhaowen Guo et al.

Social service programs like the Supplemental Nutrition Assistance Program (SNAP, or food stamps) have eligibility rules that can be challenging to understand. For nonprofit caseworkers who often support clients in navigating a dozen or more complex programs, LLM-based chatbots may offer a means to provide better, faster help to clients whose situations may be less common. In this paper, we measure the potential effects of LLM-based chatbot suggestions on caseworkers' ability to provide accurate guidance. We first created a 770-question multiple-choice benchmark dataset of difficult, but realistic questions that a caseworker might receive. Next, using these benchmark questions and corresponding expert-verified answers, we conducted a randomized experiment with caseworkers recruited from nonprofit outreach organizations in Los Angeles. Caseworkers in the control condition did not see chatbot suggestions and had a mean accuracy of 49%. Caseworkers in the treatment condition saw chatbot suggestions that we artificially varied to range in aggregate accuracy from low (53%) to high (100%). Caseworker performance significantly improves as chatbot quality improves: high-quality chatbots (96-100% accurate) improved caseworker accuracy by 27 percentage points. At the question-level, incorrect chatbot suggestions substantially reduce caseworker accuracy, with a two-thirds reduction on easy questions where the control group performed best (without chatbot suggestions). Finally, improvements in caseworker accuracy level off as chatbot accuracy increases, a phenomenon that we call the "AI underreliance plateau," which is a concern for real-world deployment and highlights the importance of evaluating human-in-the-loop tools with their users.

42.4CYMay 4
A Critical Pragmatism Approach for Algorithmic Fairness: Lessons from Urban Planning Theory

Jennah Gosciak, Karen Levy, Allison Koenecke

As data scientists grapple with increasingly complex ethical decisions in machine learning (ML) and data science, the field of algorithmic fairness has offered multiple solutions, from formal mathematical definitions to holistic notions of fairness drawn from various academic disciplines. However, navigating and implementing these fairness approaches in practice remains an ongoing challenge. In this paper, we draw a parallel between the types of problems arising in algorithmic fairness and urban planning. We frame algorithmic fairness problems as `wicked problems,' a term originating from the planning and policy space to describe the intractable, value-laden, and complex nature of this work. As such, we argue that the field of algorithmic fairness can learn from theoretical work in urban planning in ameliorating its own set of wicked problems. Urban planning is typically concerned with practical issues of governance, resource allocation, stakeholder engagement, and conflicts involving deep-seated differences. These are challenges that existing fairness frameworks can easily overlook. We present a flexible framework for designing fairer algorithms based on the urban planning theory approach of critical pragmatism -- a reflective and deliberative approach to addressing wicked problems that considers what practitioners actually do in the face of conflict and power. We provide specific recommendations and apply them to several case studies in ML and algorithm design: automated mortgage lending, school choice, and feminicide counterdata collection. Researchers and practitioners can incorporate these recommendations derived from urban planning into their ongoing work to more holistically address practical problems arising in fair algorithm design.