Emma Harvey

CY
h-index20
4papers
17citations
Novelty25%
AI Score39

4 Papers

CYMay 14
Tradeoffs are Domain Dependent: Improving Accuracy and Fairness in Property Tax Assessments

Evelyn Smith, Emma Harvey, Christopher Berry et al.

Algorithmic fairness research often assumes a tradeoff between fairness and accuracy. Yet this tradeoff may not be universal. We test this assumption in the context of U.S. property tax assessment - a setting in which the output of predictive algorithms directly determines the distribution of tax obligations among homeowners. Currently, systematic assessment errors cause owners of lower-valued properties to face disproportionately high tax burdens, creating regressivity in the property tax system. Using data on 26 million property sales spanning 95% of U.S. counties, we conduct three complementary analyses. First, we find that assessment accuracy and fairness - measured using domain-relevant metrics - are strongly correlated across counties under status quo practices. Second, in simulated assessment models, we show that adding property features improves accuracy in most cases, and that when accuracy improves, fairness almost always improves as well. Third, we show that incorporating publicly available Census data into assessment models - a feasible reform in most counties - would significantly improve both accuracy and fairness relative to status quo assessments. Together, these results challenge the presumed universality of the fairness-accuracy tradeoff and demonstrate that well-designed modeling improvements can advance both fairness and accuracy in large-scale public sector systems.

CYJun 4, 2025
Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems

Emma Harvey, Emily Sheng, Su Lin Blodgett et al. · microsoft-research

The NLP research community has made publicly available numerous instruments for measuring representational harms caused by large language model (LLM)-based systems. These instruments have taken the form of datasets, metrics, tools, and more. In this paper, we examine the extent to which such instruments meet the needs of practitioners tasked with evaluating LLM-based systems. Via semi-structured interviews with 12 such practitioners, we find that practitioners are often unable to use publicly available instruments for measuring representational harms. We identify two types of challenges. In some cases, instruments are not useful because they do not meaningfully measure what practitioners seek to measure or are otherwise misaligned with practitioner needs. In other cases, instruments - even useful instruments - are not used by practitioners due to practical and institutional barriers impeding their uptake. Drawing on measurement theory and pragmatic measurement, we provide recommendations for addressing these challenges to better meet practitioner needs.

HCMay 26, 2025
Fairness-in-the-Workflow: How Machine Learning Practitioners at Big Tech Companies Approach Fairness in Recommender Systems

Jing Nathan Yan, Emma Harvey, Junxiong Wang et al.

Recommender systems (RS), which are widely deployed across high-stakes domains, are susceptible to biases that can cause large-scale societal impacts. Researchers have proposed methods to measure and mitigate such biases -- but translating academic theory into practice is inherently challenging. RS practitioners must balance the competing interests of diverse stakeholders, including providers and users, and operate in dynamic environments. Through a semi-structured interview study (N=11), we map the RS practitioner workflow within large technology companies, focusing on how technical teams consider fairness internally and in collaboration with other (legal, data, and fairness) teams. We identify key challenges to incorporating fairness into existing RS workflows: defining fairness in RS contexts, particularly when navigating multi-stakeholder and dynamic fairness considerations. We also identify key organization-wide challenges: making time for fairness work and facilitating cross-team communication. Finally, we offer actionable recommendations for the RS community, including HCI researchers and practitioners.

CYJan 19, 2024
The Cadaver in the Machine: The Social Practices of Measurement and Validation in Motion Capture Technology

Emma Harvey, Hauke Sandhaus, Abigail Z. Jacobs et al.

Motion capture systems, used across various domains, make body representations concrete through technical processes. We argue that the measurement of bodies and the validation of measurements for motion capture systems can be understood as social practices. By analyzing the findings of a systematic literature review (N=278) through the lens of social practice theory, we show how these practices, and their varying attention to errors, become ingrained in motion capture design and innovation over time. Moreover, we show how contemporary motion capture systems perpetuate assumptions about human bodies and their movements. We suggest that social practices of measurement and validation are ubiquitous in the development of data- and sensor-driven systems more broadly, and provide this work as a basis for investigating hidden design assumptions and their potential negative consequences in human-computer interaction.