Mona Sloane

h-index13

5papers

503citations

Novelty30%

AI Score39

Ranked #77,992 of 194,257 authors (top 40%)#164 in CY (top 17%)

5 Papers

10.9CYApr 6

Context Collapse: Barriers to Adoption for Generative AI in Workplace Settings

Emanuel Moss, Elizabeth Watkins, Christopher Persaud et al.

As generative AI technologies are pressed into service in workplace settings, current approaches to account for the contexts in which such technologies are used fall short of users' expectations and needs. This paper empirically demonstrates, through expert interviews, both how these tools fail to account for users' context and how users deploy concrete strategies address such failures. The paper analyzes how context is variously conceptualized by tool developers, users, and social scientists to identify specific pitfalls inherent in computational approaches to context. Multiple distinct contexts tend to collapse into one another or rot, degrading over time, reducing the utility of any efforts to account for context. The paper concludes with a provocation to shift from an indiscriminate collection of context-relevant data toward a more interactional set of practices to embed GenAI systems more appropriately into users' contexts of use.

8.0CYJan 23, 2022Code

An External Stability Audit Framework to Test the Validity of Personality Prediction in AI Hiring

Alene K. Rhea, Kelsey Markey, Lauren D'Arinzo et al.

Automated hiring systems are among the fastest-developing of all high-stakes AI systems. Among these are algorithmic personality tests that use insights from psychometric testing, and promise to surface personality traits indicative of future success based on job seekers' resumes or social media profiles. We interrogate the validity of such systems using stability of the outputs they produce, noting that reliability is a necessary, but not a sufficient, condition for validity. Our approach is to (a) develop a methodology for an external audit of stability of predictions made by algorithmic personality tests, and (b) instantiate this methodology in an audit of two systems, Humantic AI and Crystal. Crucially, rather than challenging or affirming the assumptions made in psychometric testing -- that personality is a meaningful and measurable construct, and that personality traits are indicative of future success on the job -- we frame our methodology around testing the underlying assumptions made by the vendors of the algorithmic personality tests themselves. Our main contribution is the development of a socio-technical framework for auditing the stability of algorithmic systems. This contribution is supplemented with an open-source software library that implements the technical components of the audit, and can be used to conduct similar stability audits of algorithmic systems. We instantiate our framework with the audit of two real-world personality prediction systems, namely Humantic AI and Crystal. The application of our audit framework demonstrates that both these systems show substantial instability with respect to key facets of measurement, and hence cannot be considered valid testing instruments.

18.1CLFeb 12, 2024Code

Careless Whisper: Speech-to-Text Hallucination Harms

Allison Koenecke, Anna Seo Gyeong Choi, Katelyn X. Mei et al.

Speech-to-text services aim to transcribe input audio as accurately as possible. They increasingly play a role in everyday life, for example in personal voice assistants or in customer-company interactions. We evaluate Open AI's Whisper, a state-of-the-art automated speech recognition service outperforming industry competitors, as of 2023. While many of Whisper's transcriptions were highly accurate, we find that roughly 1\% of audio transcriptions contained entire hallucinated phrases or sentences which did not exist in any form in the underlying audio. We thematically analyze the Whisper-hallucinated content, finding that 38\% of hallucinations include explicit harms such as perpetuating violence, making up inaccurate associations, or implying false authority. We then study why hallucinations occur by observing the disparities in hallucination rates between speakers with aphasia (who have a lowered ability to express themselves using speech and voice) and a control group. We find that hallucinations disproportionately occur for individuals who speak with longer shares of non-vocal durations -- a common symptom of aphasia. We call on industry practitioners to ameliorate these language-model-based hallucinations in Whisper, and to raise awareness of potential biases amplified by hallucinations in downstream applications of speech-to-text models.

2.3CYJan 19, 2024

The Cadaver in the Machine: The Social Practices of Measurement and Validation in Motion Capture Technology

Emma Harvey, Hauke Sandhaus, Abigail Z. Jacobs et al.

Motion capture systems, used across various domains, make body representations concrete through technical processes. We argue that the measurement of bodies and the validation of measurements for motion capture systems can be understood as social practices. By analyzing the findings of a systematic literature review (N=278) through the lens of social practice theory, we show how these practices, and their varying attention to errors, become ingrained in motion capture design and innovation over time. Moreover, we show how contemporary motion capture systems perpetuate assumptions about human bodies and their movements. We suggest that social practices of measurement and validation are ubiquitous in the development of data- and sensor-driven systems more broadly, and provide this work as a basis for investigating hidden design assumptions and their potential negative consequences in human-computer interaction.

28.4CYJul 5, 2020

Participation is not a Design Fix for Machine Learning

Mona Sloane, Emanuel Moss, Olaitan Awomolo et al.

This paper critically examines existing modes of participation in design practice and machine learning. Cautioning against 'participation-washing', it suggests that the ML community must become attuned to possibly exploitative and extractive forms of community involvement and shift away from the prerogatives of context-independent scalability.