Benjamin Mako Hill

CY
h-index6
19papers
967citations
Novelty32%
AI Score27

19 Papers

AINov 15, 2024
Generative Agent Simulations of 1,000 People

Joon Sung Park, Carolyn Q. Zou, Aaron Shaw et al.

The promise of human behavioral simulation--general-purpose computational agents that replicate human behavior across domains--could enable broad applications in policymaking and social science. We present a novel agent architecture that simulates the attitudes and behaviors of 1,052 real individuals--applying large language models to qualitative interviews about their lives, then measuring how well these agents replicate the attitudes and behaviors of the individuals that they represent. The generative agents replicate participants' responses on the General Social Survey 85% as accurately as participants replicate their own answers two weeks later, and perform comparably in predicting personality traits and outcomes in experimental replications. Our architecture reduces accuracy biases across racial and ideological groups compared to agents given demographic descriptions. This work provides a foundation for new tools that can help investigate individual and collective behavior.

SEFeb 27, 2021Code
Underproduction: An Approach for Measuring Risk in Open Source Software

Kaylea Champion, Benjamin Mako Hill

The widespread adoption of Free/Libre and Open Source Software (FLOSS) means that the ongoing maintenance of many widely used software components relies on the collaborative effort of volunteers who set their own priorities and choose their own tasks. We argue that this has created a new form of risk that we call 'underproduction' which occurs when the supply of software engineering labor becomes out of alignment with the demand of people who rely on the software produced. We present a conceptual framework for identifying relative underproduction in software as well as a statistical method for applying our framework to a comprehensive dataset from the Debian GNU/Linux distribution that includes 21,902 source packages and the full history of 461,656 bugs. We draw on this application to present two experiments: (1) a demonstration of how our technique can be used to identify at-risk software packages in a large FLOSS repository and (2) a validation of these results using an alternate indicator of package risk. Our analysis demonstrates both the utility of our approach and reveals the existence of widespread underproduction in a range of widely-installed software components in Debian.

HCFeb 11, 2022
The Risks, Benefits, and Consequences of Prepublication Moderation: Evidence from 17 Wikipedia Language Editions

Chau Tran, Kaylea Champion, Benjamin Mako Hill et al.

Many online communities rely on postpublication moderation where contributors, even those that are perceived as being risky, are allowed to publish material immediately and where moderation takes place after the fact. An alternative arrangement involves moderating content before publication. A range of communities have argued against prepublication moderation by suggesting that it makes contributing less enjoyable for new members and that it will distract established community members with extra moderation work. We present an empirical analysis of the effects of a prepublication moderation system called FlaggedRevs that was deployed by several Wikipedia language editions. We used panel data from 17 large Wikipedia editions to test a series of hypotheses related to the effect of the system on activity levels and contribution quality. We found that the system was very effective at keeping low-quality contributions from ever becoming visible. Although there is some evidence that the system discouraged participation among users without accounts, our analysis suggests that the system's effects on contribution volume and quality were moderate at most. Our findings imply that concerns regarding the major negative effects of prepublication moderation systems on contribution quality and project productivity may be overstated.

SIJan 12, 2022
No Community Can Do Everything: Why People Participate in Similar Online Communities

Nathan TeBlunthuis, Charles Kiene, Isabella Brown et al.

Large-scale quantitative analyses have shown that individuals frequently talk to each other about similar things in different online spaces. Why do these overlapping communities exist? We provide an answer grounded in the analysis of 20 interviews with active participants in clusters of highly related subreddits. Within a broad topical area, there are a diversity of benefits an online community can confer. These include (a) specific information and discussion, (b) socialization with similar others, and (c) attention from the largest possible audience. A single community cannot meet all three needs. Our findings suggest that topical areas within an online community platform tend to become populated by groups of specialized communities with diverse sizes, topical boundaries, and rules. Compared with any single community, such systems of overlapping communities are able to provide a greater range of benefits.

CYNov 20, 2021
The Hidden Costs of Requiring Accounts: Quasi-Experimental Evidence From Peer Production

Benjamin Mako Hill, Aaron Shaw

Online communities, like Wikipedia, produce valuable public information goods. Whereas some of these communities require would-be contributors to create accounts, many do not. Does this requirement catalyze cooperation or inhibit participation? Prior research provides divergent predictions but little causal evidence. We conduct an empirical test using longitudinal data from 136 natural experiments where would-be contributors to wikis were suddenly required to log in to contribute. Requiring accounts leads to a small increase in account creation, but reduces both high- and low-quality contributions from registered and unregistered participants. Although the change deters a large portion of low-quality participation, the vast majority of deterred contributions are of higher quality. We conclude that requiring accounts introduces an undertheorized tradeoff for public goods production in interactive communication systems.

SEJul 29, 2021
Qualities of Quality: A Tertiary Review of Software Quality Measurement Research

Kaylea Champion, Sejal Khatri, Benjamin Mako Hill

This paper presents a tertiary review of software quality measurement research. To conduct this review, we examined an initial dataset of 7,811 articles and found 75 relevant and high-quality secondary analyses of software quality research. Synthesizing this body of work, we offer an overview of perspectives, measurement approaches, and trends. We identify five distinct perspectives that conceptualize quality as heuristic, as maintainability, as a holistic concept, as structural features of software, and as dependability. We also identify three key challenges. First, we find widespread evidence of validity questions with common measures. Second, we observe the application of machine learning methods without adequate evaluation. Third, we observe the use of aging datasets. Finally, from these observations, we sketch a path toward a theoretical framework that will allow software engineering researchers to systematically confront these weaknesses while remaining grounded in the experiences of developers and the real world in which code is ultimately deployed.

HCJul 14, 2021
Identifying Competition and Mutualism Between Online Groups

Nathan TeBlunthuis, Benjamin Mako Hill

Platforms often host multiple online groups with overlapping topics and members. How can researchers and designers understand how related groups affect each other? Inspired by population ecology, prior research in social computing and human-computer interaction has studied related groups by correlating group size with degrees of overlap in content and membership, but has produced puzzling results: overlap is associated with competition in some contexts but with mutualism in others. We suggest that this inconsistency results from aggregating intergroup relationships into an overall environmental effect that obscures the diversity of competition and mutualism among related groups. Drawing on the framework of community ecology, we introduce a time-series method for inferring competition and mutualism. We then use this framework to inform a large-scale analysis of clusters of subreddits that all have high user overlap. We find that mutualism is more common than competition.

HCAug 4, 2020
Designing for Critical Algorithmic Literacies

Sayamindu Dasgupta, Benjamin Mako Hill

As pervasive data collection and powerful algorithms increasingly shape children's experience of the world and each other, their ability to interrogate computational algorithms has become crucially important. A growing body of work has attempted to articulate a set of "literacies" to describe the intellectual tools that children can use to understand, interrogate, and critique the algorithmic systems that shape their lives. Unfortunately, because many algorithms are invisible, only a small number of children develop the literacies required to critique these systems. How might designers support the development of critical algorithmic literacies? Based on our experience designing two data programming systems, we present four design principles that we argue can help children develop literacies that allow them to understand not only how algorithms work, but also to critique and question them.

CYJun 4, 2020
Effects of algorithmic flagging on fairness: quasi-experimental evidence from Wikipedia

Nathan TeBlunthuis, Benjamin Mako Hill, Aaron Halfaker

Online community moderators often rely on social signals such as whether or not a user has an account or a profile page as clues that users may cause problems. Reliance on these clues can lead to "overprofiling'' bias when moderators focus on these signals but overlook the misbehavior of others. We propose that algorithmic flagging systems deployed to improve the efficiency of moderation work can also make moderation actions more fair to these users by reducing reliance on social signals and making norm violations by everyone else more visible. We analyze moderator behavior in Wikipedia as mediated by RCFilters, a system which displays social signals and algorithmic flags, and estimate the causal effect of being flagged on moderator actions. We show that algorithmically flagged edits are reverted more often, especially those by established editors with positive social signals, and that flagging decreases the likelihood that moderation actions will be undone. Our results suggest that algorithmic flagging systems can lead to increased fairness in some contexts but that the relationship is complex and contingent.

SIApr 8, 2019
Are anonymity-seekers just like everybody else? An analysis of contributions to Wikipedia from Tor

Chau Tran, Kaylea Champion, Andrea Forte et al.

User-generated content sites routinely block contributions from users of privacy-enhancing proxies like Tor because of a perception that proxies are a source of vandalism, spam, and abuse. Although these blocks might be effective, collateral damage in the form of unrealized valuable contributions from anonymity seekers is invisible. One of the largest and most important user-generated content sites, Wikipedia, has attempted to block contributions from Tor users since as early as 2005. We demonstrate that these blocks have been imperfect and that thousands of attempts to edit on Wikipedia through Tor have been successful. We draw upon several data sources and analytical techniques to measure and describe the history of Tor editing on Wikipedia over time and to compare contributions from Tor users to those from other groups of Wikipedia users. Our analysis suggests that although Tor users who slip through Wikipedia's ban contribute content that is more likely to be reverted and to revert others, their contributions are otherwise similar in quality to those from other unregistered participants and to the initial contributions of registered users.

CYFeb 3, 2017
A longitudinal dataset of five years of public activity in the Scratch online community

Benjamin Mako Hill, Andrés Monroy-Hernández

Scratch is a programming environment and an online community where young people can create, share, learn, and communicate. In collaboration with the Scratch Team at MIT, we created a longitudinal dataset of public activity in the Scratch online community during its first five years (2007-2012). The dataset comprises 32 tables with information on more than 1 million Scratch users, nearly 2 million Scratch projects, more than 10 million comments, more than 30 million visits to Scratch projects, and more. To help researchers understand this dataset, and to establish the validity of the data, we also include the source code of every version of the software that operated the website, as well as the software used to generate this dataset. We believe this is the largest and most comprehensive downloadable dataset of youth programming artifacts and communication.

HCFeb 1, 2017
Scratch Community Blocks: Supporting Children as Data Scientists

Sayamindu Dasgupta, Benjamin Mako Hill

In this paper, we present Scratch Community Blocks, a new system that enables children to programmatically access, analyze, and visualize data about their participation in Scratch, an online community for learning computer programming. At its core, our approach involves a shift in who analyzes data: from adult data scientists to young learners themselves. We first introduce the goals and design of the system and then demonstrate it by describing example projects that illustrate its functionality. Next, we show through a series of case studies how the system engages children in not only representing data and answering questions with data but also in self-reflection about their own learning and participation.

HCMay 28, 2016
Surviving an "Eternal September" - How an Online Community Managed a Surge of Newcomers

Charles Kiene, Andrés Monroy-Hernández, Benjamin Mako Hill

We present a qualitative analysis of interviews with participants in the NoSleep community within Reddit where millions of fans and writers of horror fiction congregate. We explore how the community handled a massive, sudden, and sustained increase in new members. Although existing theory and stories like Usenet's infamous "Eternal September" suggest that large influxes of newcomers can hurt online communities, our interviews suggest that NoSleep survived without major incident. We propose that three features of NoSleep allowed it to manage the rapid influx of newcomers gracefully: (1) an active and well-coordinated group of administrators, (2) a shared sense of community which facilitated community moderation, and (3) technological systems that mitigated norm violations. We also point to several important trade-offs and limitations.

CYMay 27, 2016
Remixing as a Pathway to Computational Thinking

Sayamindu Dasgupta, William Hale, Andrés Monroy-Hernández et al.

Theorists and advocates of "remixing" have suggested that appropriation can act as a pathway for learning. We test this theory quantitatively using data from more than 2.4 million multimedia programming projects shared by more than 1 million users in the Scratch online community. First, we show that users who remix more often have larger repertoires of programming commands even after controlling for the numbers of projects and amount of code shared. Second, we show that exposure to computational thinking concepts through remixing is associated with increased likelihood of using those concepts. Our results support theories that young people learn through remixing, and have important implications for designers of social computing systems.

CYJul 5, 2015
The Cost of Collaboration for Code and Art: Evidence from a Remixing Community

Benjamin Mako Hill, Andrés Monroy-Hernández

In this paper, we use evidence from a remixing community to evaluate two pieces of common wisdom about collaboration. First, we test the theory that jointly produced works tend to be of higher quality than individually authored products. Second, we test the theory that collaboration improves the quality of functional works like code, but that it works less well for artistic works like images and sounds. We use data from Scratch, a large online community where hundreds of thousands of young users share and remix millions of animations and interactive games. Using peer-ratings as a measure of quality, we estimate a series of fitted regression models and find that collaborative Scratch projects tend to receive ratings that are lower than individually authored works. We also find that code-intensive collaborations are rated higher than media-intensive efforts. We conclude by discussing the limitations and implications of these findings.

CYJul 5, 2015
The Remixing Dilemma: The Trade-off Between Generativity and Originality

Benjamin Mako Hill, Andrés Monroy-Hernández

In this paper we argue that there is a trade-off between generativity and originality in online communities that support open collaboration. We build on foundational theoretical work in peer production to formulate and test a series of hypotheses suggesting that the generativity of creative works is associated with moderate complexity, prominent authors, and cumulativeness. We also formulate and test three hypotheses that these qualities are associated with decreased originality in resulting derivatives. Our analysis uses a rich data set from the Scratch Online Community --a large web-site where young people openly share and remix animations and video games. We discuss the implications of this trade-off for the design of peer production systems that support amateur creativity.

HCJul 5, 2015
Computers Can't Give Credit: How Automatic Attribution Falls Short in an Online Remixing Community

Andrés Monroy-Hernández, Benjamin Mako Hill, Jazmin Gonzalez-Rivero et al.

In this paper, we explore the role that attribution plays in shaping user reactions to content reuse, or remixing, in a large user-generated content community. We present two studies using data from the Scratch online community -- a social media platform where hundreds of thousands of young people share and remix animations and video games. First, we present a quantitative analysis that examines the effects of a technological design intervention introducing automated attribution of remixes on users' reactions to being remixed. We compare this analysis to a parallel examination of "manual" credit-giving. Second, we present a qualitative analysis of twelve in-depth, semi-structured, interviews with Scratch participants on the subject of remixing and attribution. Results from both studies suggest that automatic attribution done by technological systems (i.e., the listing of names of contributors) plays a role that is distinct from, and less valuable than, credit which may superficially involve identical information but takes on new meaning when it is given by a human remixer. We discuss the implications of these findings for the designers of online communities and social media platforms.

HCJul 5, 2015
Responses to remixing on a social media sharing website

Benjamin Mako Hill, Andrés Monroy-Hernández, Kristina R. Olson

In this paper we describe the ways participants of the Scratch online community, primarily young people, engage in remixing of each others' shared animations, games, and interactive projects. In particular, we try to answer the following questions: How do users respond to remixing in a social media environment where remixing is explicitly permitted? What qualities of originators and their projects correspond to a higher likelihood of plagiarism accusations? Is there a connection between plagiarism complaints and similarities between a remix and the work it is based on? Our findings indicate that users have a very wide range of reactions to remixing and that as many users react positively as accuse remixers of plagiarism. We test several hypotheses that might explain the high number of plagiarism accusations related to original project complexity, cumulative remixing, originators' integration into remixing practice, and remixee-remixer project similarity, and find support for the first and last explanations.

CYJun 30, 2014
WeDo: Exploring Participatory, End-To-End Collective Action

Haoqi Zhang, Andes Monroy-Hernandez, Aaron Shaw et al.

Many celebrate the Internet's ability to connect individuals and facilitate collective action toward a common goal. While numerous systems have been designed to support particular aspects of collective action, few systems support participatory, end-to-end collective action in which a crowd or community identifies opportunities, formulates goals, brainstorms ideas, develops plans, mobilizes, and takes action. To explore the possibilities and barriers in supporting such interactions, we have developed WeDo, a system aimed at promoting simple forms of participatory, end-to-end collective action. Pilot deployments of WeDo illustrate that sociotechnical systems can support automated transitions through different phases of end-to-end collective action, but that challenges, such as the elicitation of leadership and the accommodation of existing group norms, remain.