AIFeb 17
This human study did not involve human subjects: Validating LLM simulations as behavioral evidenceJessica Hullman, David Broska, Huaman Sun et al.
A growing literature uses large language models (LLMs) as synthetic participants to generate cost-effective and nearly instantaneous responses in social science experiments. However, there is limited guidance on when such simulations support valid inference about human behavior. We contrast two strategies for obtaining valid estimates of causal effects and clarify the assumptions under which each is suitable for exploratory versus confirmatory research. Heuristic approaches seek to establish that simulated and observed human behavior are interchangeable through prompt engineering, model fine-tuning, and other repair strategies designed to reduce LLM-induced inaccuracies. While useful for many exploratory tasks, heuristic approaches lack the formal statistical guarantees typically required for confirmatory research. In contrast, statistical calibration combines auxiliary human data with statistical adjustments to account for discrepancies between observed and simulated responses. Under explicit assumptions, statistical calibration preserves validity and provides more precise estimates of causal effects at lower cost than experiments that rely solely on human participants. Yet the potential of both approaches depends on how well LLMs approximate the relevant populations. We consider what opportunities are overlooked when researchers focus myopically on substituting LLMs for human participants in a study.
AINov 15, 2024
Generative Agent Simulations of 1,000 PeopleJoon Sung Park, Carolyn Q. Zou, Aaron Shaw et al.
The promise of human behavioral simulation--general-purpose computational agents that replicate human behavior across domains--could enable broad applications in policymaking and social science. We present a novel agent architecture that simulates the attitudes and behaviors of 1,052 real individuals--applying large language models to qualitative interviews about their lives, then measuring how well these agents replicate the attitudes and behaviors of the individuals that they represent. The generative agents replicate participants' responses on the General Social Survey 85% as accurately as participants replicate their own answers two weeks later, and perform comparably in predicting personality traits and outcomes in experimental replications. Our architecture reduces accuracy biases across racial and ideological groups compared to agents given demographic descriptions. This work provides a foundation for new tools that can help investigate individual and collective behavior.
27.6SEMar 13Code
Linguistic Similarity Within Centralized FLOSS DevelopmentMatthew Gaughan, Aaron Shaw, Darren Gergle
When free/libre and open source software (FLOSS) stewards centralize project development, they potentially undermine project sustainability and impact how contributors talk to each other. To study the relationship between steward-centralized development and contributor discussion, we compared the development of three Wikimedia platform features that the Wikimedia Foundation (WMF) built in MediaWiki. In a mixed-methods multi-case comparison, we used repository mining, linguistic style features, and principal component analysis to track MediaWiki feature development and issue discussions. Contrary to both our intuition and prior work, there were no identifiable differences in the linguistic style of WMF-affiliates and external contributors, even when feature development was guided by WMF contributions. From these results, we offer two provocations to the study of collaborative FLOSS development: (1) stewards dominate development according to their own use of specific project functionality; (2) centralized project development does not entail hierarchical language within project discussions.
CYNov 20, 2021
The Hidden Costs of Requiring Accounts: Quasi-Experimental Evidence From Peer ProductionBenjamin Mako Hill, Aaron Shaw
Online communities, like Wikipedia, produce valuable public information goods. Whereas some of these communities require would-be contributors to create accounts, many do not. Does this requirement catalyze cooperation or inhibit participation? Prior research provides divergent predictions but little causal evidence. We conduct an empirical test using longitudinal data from 136 natural experiments where would-be contributors to wikis were suddenly required to log in to contribute. Requiring accounts leads to a small increase in account creation, but reduces both high- and low-quality contributions from registered and unregistered participants. Although the change deters a large portion of low-quality participation, the vast majority of deterred contributions are of higher quality. We conclude that requiring accounts introduces an undertheorized tradeoff for public goods production in interactive communication systems.
SINov 4, 2016
Black Lives Matter in Wikipedia: Collaboration and Collective Memory around Online Social MovementsMarlon Twyman, Brian C. Keegan, Aaron Shaw
Social movements use social computing systems to complement offline mobilizations, but prior literature has focused almost exclusively on movement actors' use of social media. In this paper, we analyze participation and attention to topics connected with the Black Lives Matter movement in the English language version of Wikipedia between 2014 and 2016. Our results point to the use of Wikipedia to (1) intensively document and connect historical and contemporary events, (2) collaboratively migrate activity to support coverage of new events, and (3) dynamically re-appraise pre-existing knowledge in the aftermath of new events. These findings reveal patterns of behavior that complement theories of collective memory and collective action and help explain how social computing systems can encode and retrieve knowledge about social movements as they unfold.
CYJun 30, 2014
WeDo: Exploring Participatory, End-To-End Collective ActionHaoqi Zhang, Andes Monroy-Hernandez, Aaron Shaw et al.
Many celebrate the Internet's ability to connect individuals and facilitate collective action toward a common goal. While numerous systems have been designed to support particular aspects of collective action, few systems support participatory, end-to-end collective action in which a crowd or community identifies opportunities, formulates goals, brainstorms ideas, develops plans, mobilizes, and takes action. To explore the possibilities and barriers in supporting such interactions, we have developed WeDo, a system aimed at promoting simple forms of participatory, end-to-end collective action. Pilot deployments of WeDo illustrate that sociotechnical systems can support automated transitions through different phases of end-to-end collective action, but that challenges, such as the elicitation of leadership and the accommodation of existing group norms, remain.