71.9CRApr 6
A Common Pool of Privacy Problems: Legal and Technical Lessons from a Large-Scale Web-Scraped Machine Learning DatasetRachel Hong, Jevan Hutson, William Agnew et al.
We investigate the contents of web-scraped data for training AI systems, at sizes where human dataset curators and compilers no longer manually annotate every sample. Building off of prior privacy concerns in machine learning models, we ask: What are the legal privacy implications of web-scraped machine learning datasets? In an empirical study of a popular training dataset, we find significant presence of personally identifiable information despite sanitization efforts. Our audit provides concrete evidence to support the concern that any large-scale web-scraped dataset may contain legally defined personal data. We use these findings of a real-world dataset to inform our legal analysis with respect to existing privacy and data protection laws. We surface various legal risks of current data curation practices that may propagate personal information to train downstream models. Based on our empirical and legal analyses, we argue for reorientation of current frameworks of "publicly available" information to meaningfully limit the development of AI built upon indiscriminate scraping of the internet.
CYMar 2
Slurry-as-a-Service: A Modest Proposal on Scalable Pluralistic Alignment for Nutrient OptimizationRachel Hong, Yael Eiger, Jevan Hutson et al.
Pluralistic alignment has emerged as a promising approach for ensuring that large language models (LLMs) faithfully represent the diversity, nuance, and conflict inherent in human values. In this work, we study a high-stakes deployment context - mulching - where automated systems transform selected individuals into nutrient-rich slurry for the dual purposes of food security and aesthetic population management. Building on recent pluralistic alignment frameworks, we introduce ValueMulch, a reproducible training, deployment, and certification pipeline for aligning mulching models (MMs) to a wide range of community norms. Through a real-world testbed spanning 32 communities, we show that ValueMulch improves distributional agreement with community mulching preferences relative to frontier baselines. We conclude with a discussion of ethical considerations, limitations, and implications for researchers seeking to align systems to the full spectrum of human values - especially when those values are inconsistent, commercially inconvenient, or nutritionally underutilized. Author's note: This piece builds on prior existing work Keyes et al in 2019 that satirized cannibalism as a parody for approaches that imbue ethics into problematic technology. We bring those ideas to today's era with the proliferation of large language models in everyday lives, as a critique of current AI pluralistic alignment literature. Our work does not intend to argue that all alignment practices are evil, but rather that if framing value design as a technical problem enables technology systems to enact harms, then perhaps this framing is not enough.
HCJun 8, 2020
Surveillance, Stigma & Sociotechnical Design for HIVCalvin Liang, Jevan Hutson, Os Keyes
Online dating and hookup platforms have fundamentally changed people's day-to-day practices of sex and love-but exist in tension with older social and medicolegal norms. This is particularly the case for people with HIV, who are frequently stigmatized, surveilled, ostracized and incarcerated because of their status. Efforts to make intimate platforms "work" for HIV frequently focus on user-to-user interactions and disclosure of one's HIV status but elide both the structural forces at work in regulating sex and the involvement of the state in queer lives. In an effort to foreground these forces and this involvement, we analyze the approaches that intimate platforms have taken in designing for HIV disclosure through a content analysis of 49 current platforms. We argue that the implicit reinforcement of stereotypes about who HIV is or is not a concern for, along with the failure to consider state practices when designing for data disclosure, opens up serious risks for HIV-positive and otherwise marginalized people. While we have no panacea for the tension between disclosure and risk, we point to bottom-up, communal, and queer approaches to design as a way of potentially making that tension easier to safely navigate.
CYSep 5, 2018
Debiasing Desire: Addressing Bias & Discrimination on Intimate PlatformsJevan Hutson, Jessie G. Taft, Solon Barocas et al.
Designing technical systems to be resistant to bias and discrimination represents vital new terrain for researchers, policymakers, and the anti-discrimination project more broadly. We consider bias and discrimination in the context of popular online dating and hookup platforms in the United States, which we call intimate platforms. Drawing on work in social-justice-oriented and Queer HCI, we review design features of popular intimate platforms and their potential role in exacerbating or mitigating interpersonal bias. We argue that focusing on platform design can reveal opportunities to reshape troubling patterns of intimate contact without overriding users' decisional autonomy. We identify and address the difficult ethical questions that nevertheless come along with such intervention, while urging the social computing community to engage more deeply with issues of bias, discrimination, and exclusion in the study and design of intimate platforms.