Benjamin Mako Hill

h-index26

15papers

788citations

Novelty33%

AI Score28

Ranked #151,553 of 194,257 authors (top 78%)#1,382 in HC (top 55%)

15 Papers

45.0AINov 15, 2024

Generative Agent Simulations of 1,000 People

Joon Sung Park, Carolyn Q. Zou, Aaron Shaw et al.

The promise of human behavioral simulation--general-purpose computational agents that replicate human behavior across domains--could enable broad applications in policymaking and social science. We present a novel agent architecture that simulates the attitudes and behaviors of 1,052 real individuals--applying large language models to qualitative interviews about their lives, then measuring how well these agents replicate the attitudes and behaviors of the individuals that they represent. The generative agents replicate participants' responses on the General Social Survey 85% as accurately as participants replicate their own answers two weeks later, and perform comparably in predicting personality traits and outcomes in experimental replications. Our architecture reduces accuracy biases across racial and ideological groups compared to agents given demographic descriptions. This work provides a foundation for new tools that can help investigate individual and collective behavior.

10.4SEFeb 27, 2021Code

Underproduction: An Approach for Measuring Risk in Open Source Software

Kaylea Champion, Benjamin Mako Hill

The widespread adoption of Free/Libre and Open Source Software (FLOSS) means that the ongoing maintenance of many widely used software components relies on the collaborative effort of volunteers who set their own priorities and choose their own tasks. We argue that this has created a new form of risk that we call 'underproduction' which occurs when the supply of software engineering labor becomes out of alignment with the demand of people who rely on the software produced. We present a conceptual framework for identifying relative underproduction in software as well as a statistical method for applying our framework to a comprehensive dataset from the Debian GNU/Linux distribution that includes 21,902 source packages and the full history of 461,656 bugs. We draw on this application to present two experiments: (1) a demonstration of how our technique can be used to identify at-risk software packages in a large FLOSS repository and (2) a validation of these results using an alternate indicator of package risk. Our analysis demonstrates both the utility of our approach and reveals the existence of widespread underproduction in a range of widely-installed software components in Debian.

2.9HCFeb 11, 2022

The Risks, Benefits, and Consequences of Prepublication Moderation: Evidence from 17 Wikipedia Language Editions

Chau Tran, Kaylea Champion, Benjamin Mako Hill et al.

Many online communities rely on postpublication moderation where contributors, even those that are perceived as being risky, are allowed to publish material immediately and where moderation takes place after the fact. An alternative arrangement involves moderating content before publication. A range of communities have argued against prepublication moderation by suggesting that it makes contributing less enjoyable for new members and that it will distract established community members with extra moderation work. We present an empirical analysis of the effects of a prepublication moderation system called FlaggedRevs that was deployed by several Wikipedia language editions. We used panel data from 17 large Wikipedia editions to test a series of hypotheses related to the effect of the system on activity levels and contribution quality. We found that the system was very effective at keeping low-quality contributions from ever becoming visible. Although there is some evidence that the system discouraged participation among users without accounts, our analysis suggests that the system's effects on contribution volume and quality were moderate at most. Our findings imply that concerns regarding the major negative effects of prepublication moderation systems on contribution quality and project productivity may be overstated.

4.3SIJan 12, 2022

No Community Can Do Everything: Why People Participate in Similar Online Communities

Nathan TeBlunthuis, Charles Kiene, Isabella Brown et al.

Large-scale quantitative analyses have shown that individuals frequently talk to each other about similar things in different online spaces. Why do these overlapping communities exist? We provide an answer grounded in the analysis of 20 interviews with active participants in clusters of highly related subreddits. Within a broad topical area, there are a diversity of benefits an online community can confer. These include (a) specific information and discussion, (b) socialization with similar others, and (c) attention from the largest possible audience. A single community cannot meet all three needs. Our findings suggest that topical areas within an online community platform tend to become populated by groups of specialized communities with diverse sizes, topical boundaries, and rules. Compared with any single community, such systems of overlapping communities are able to provide a greater range of benefits.

3.6SEJul 29, 2021

Qualities of Quality: A Tertiary Review of Software Quality Measurement Research

Kaylea Champion, Sejal Khatri, Benjamin Mako Hill

This paper presents a tertiary review of software quality measurement research. To conduct this review, we examined an initial dataset of 7,811 articles and found 75 relevant and high-quality secondary analyses of software quality research. Synthesizing this body of work, we offer an overview of perspectives, measurement approaches, and trends. We identify five distinct perspectives that conceptualize quality as heuristic, as maintainability, as a holistic concept, as structural features of software, and as dependability. We also identify three key challenges. First, we find widespread evidence of validity questions with common measures. Second, we observe the application of machine learning methods without adequate evaluation. Third, we observe the use of aging datasets. Finally, from these observations, we sketch a path toward a theoretical framework that will allow software engineering researchers to systematically confront these weaknesses while remaining grounded in the experiences of developers and the real world in which code is ultimately deployed.

8.6HCJul 14, 2021

Identifying Competition and Mutualism Between Online Groups

Nathan TeBlunthuis, Benjamin Mako Hill

Platforms often host multiple online groups with overlapping topics and members. How can researchers and designers understand how related groups affect each other? Inspired by population ecology, prior research in social computing and human-computer interaction has studied related groups by correlating group size with degrees of overlap in content and membership, but has produced puzzling results: overlap is associated with competition in some contexts but with mutualism in others. We suggest that this inconsistency results from aggregating intergroup relationships into an overall environmental effect that obscures the diversity of competition and mutualism among related groups. Drawing on the framework of community ecology, we introduce a time-series method for inferring competition and mutualism. We then use this framework to inform a large-scale analysis of clusters of subreddits that all have high user overlap. We find that mutualism is more common than competition.

3.3HCAug 4, 2020

Designing for Critical Algorithmic Literacies

Sayamindu Dasgupta, Benjamin Mako Hill

As pervasive data collection and powerful algorithms increasingly shape children's experience of the world and each other, their ability to interrogate computational algorithms has become crucially important. A growing body of work has attempted to articulate a set of "literacies" to describe the intellectual tools that children can use to understand, interrogate, and critique the algorithmic systems that shape their lives. Unfortunately, because many algorithms are invisible, only a small number of children develop the literacies required to critique these systems. How might designers support the development of critical algorithmic literacies? Based on our experience designing two data programming systems, we present four design principles that we argue can help children develop literacies that allow them to understand not only how algorithms work, but also to critique and question them.

5.1CYJun 4, 2020

Effects of algorithmic flagging on fairness: quasi-experimental evidence from Wikipedia

Nathan TeBlunthuis, Benjamin Mako Hill, Aaron Halfaker

Online community moderators often rely on social signals such as whether or not a user has an account or a profile page as clues that users may cause problems. Reliance on these clues can lead to "overprofiling'' bias when moderators focus on these signals but overlook the misbehavior of others. We propose that algorithmic flagging systems deployed to improve the efficiency of moderation work can also make moderation actions more fair to these users by reducing reliance on social signals and making norm violations by everyone else more visible. We analyze moderator behavior in Wikipedia as mediated by RCFilters, a system which displays social signals and algorithmic flags, and estimate the causal effect of being flagged on moderator actions. We show that algorithmically flagged edits are reverted more often, especially those by established editors with positive social signals, and that flagging decreases the likelihood that moderation actions will be undone. Our results suggest that algorithmic flagging systems can lead to increased fairness in some contexts but that the relationship is complex and contingent.

7.7HCFeb 1, 2017

Scratch Community Blocks: Supporting Children as Data Scientists

Sayamindu Dasgupta, Benjamin Mako Hill

In this paper, we present Scratch Community Blocks, a new system that enables children to programmatically access, analyze, and visualize data about their participation in Scratch, an online community for learning computer programming. At its core, our approach involves a shift in who analyzes data: from adult data scientists to young learners themselves. We first introduce the goals and design of the system and then demonstrate it by describing example projects that illustrate its functionality. Next, we show through a series of case studies how the system engages children in not only representing data and answering questions with data but also in self-reflection about their own learning and participation.

17.1HCMay 28, 2016

Surviving an "Eternal September" - How an Online Community Managed a Surge of Newcomers

Charles Kiene, Andrés Monroy-Hernández, Benjamin Mako Hill

We present a qualitative analysis of interviews with participants in the NoSleep community within Reddit where millions of fans and writers of horror fiction congregate. We explore how the community handled a massive, sudden, and sustained increase in new members. Although existing theory and stories like Usenet's infamous "Eternal September" suggest that large influxes of newcomers can hurt online communities, our interviews suggest that NoSleep survived without major incident. We propose that three features of NoSleep allowed it to manage the rapid influx of newcomers gracefully: (1) an active and well-coordinated group of administrators, (2) a shared sense of community which facilitated community moderation, and (3) technological systems that mitigated norm violations. We also point to several important trade-offs and limitations.

5.1CYJul 5, 2015

The Cost of Collaboration for Code and Art: Evidence from a Remixing Community

Benjamin Mako Hill, Andrés Monroy-Hernández

In this paper, we use evidence from a remixing community to evaluate two pieces of common wisdom about collaboration. First, we test the theory that jointly produced works tend to be of higher quality than individually authored products. Second, we test the theory that collaboration improves the quality of functional works like code, but that it works less well for artistic works like images and sounds. We use data from Scratch, a large online community where hundreds of thousands of young users share and remix millions of animations and interactive games. Using peer-ratings as a measure of quality, we estimate a series of fitted regression models and find that collaborative Scratch projects tend to receive ratings that are lower than individually authored works. We also find that code-intensive collaborations are rated higher than media-intensive efforts. We conclude by discussing the limitations and implications of these findings.

7.3CYJul 5, 2015

The Remixing Dilemma: The Trade-off Between Generativity and Originality

Benjamin Mako Hill, Andrés Monroy-Hernández

In this paper we argue that there is a trade-off between generativity and originality in online communities that support open collaboration. We build on foundational theoretical work in peer production to formulate and test a series of hypotheses suggesting that the generativity of creative works is associated with moderate complexity, prominent authors, and cumulativeness. We also formulate and test three hypotheses that these qualities are associated with decreased originality in resulting derivatives. Our analysis uses a rich data set from the Scratch Online Community --a large web-site where young people openly share and remix animations and video games. We discuss the implications of this trade-off for the design of peer production systems that support amateur creativity.

9.6HCJul 5, 2015

Computers Can't Give Credit: How Automatic Attribution Falls Short in an Online Remixing Community

Andrés Monroy-Hernández, Benjamin Mako Hill, Jazmin Gonzalez-Rivero et al.

In this paper, we explore the role that attribution plays in shaping user reactions to content reuse, or remixing, in a large user-generated content community. We present two studies using data from the Scratch online community -- a social media platform where hundreds of thousands of young people share and remix animations and video games. First, we present a quantitative analysis that examines the effects of a technological design intervention introducing automated attribution of remixes on users' reactions to being remixed. We compare this analysis to a parallel examination of "manual" credit-giving. Second, we present a qualitative analysis of twelve in-depth, semi-structured, interviews with Scratch participants on the subject of remixing and attribution. Results from both studies suggest that automatic attribution done by technological systems (i.e., the listing of names of contributors) plays a role that is distinct from, and less valuable than, credit which may superficially involve identical information but takes on new meaning when it is given by a human remixer. We discuss the implications of these findings for the designers of online communities and social media platforms.

11.1HCJul 5, 2015

Responses to remixing on a social media sharing website

Benjamin Mako Hill, Andrés Monroy-Hernández, Kristina R. Olson

In this paper we describe the ways participants of the Scratch online community, primarily young people, engage in remixing of each others' shared animations, games, and interactive projects. In particular, we try to answer the following questions: How do users respond to remixing in a social media environment where remixing is explicitly permitted? What qualities of originators and their projects correspond to a higher likelihood of plagiarism accusations? Is there a connection between plagiarism complaints and similarities between a remix and the work it is based on? Our findings indicate that users have a very wide range of reactions to remixing and that as many users react positively as accuse remixers of plagiarism. We test several hypotheses that might explain the high number of plagiarism accusations related to original project complexity, cumulative remixing, originators' integration into remixing practice, and remixee-remixer project similarity, and find support for the first and last explanations.

1.2CYJun 30, 2014

WeDo: Exploring Participatory, End-To-End Collective Action

Haoqi Zhang, Andes Monroy-Hernandez, Aaron Shaw et al.

Many celebrate the Internet's ability to connect individuals and facilitate collective action toward a common goal. While numerous systems have been designed to support particular aspects of collective action, few systems support participatory, end-to-end collective action in which a crowd or community identifies opportunities, formulates goals, brainstorms ideas, develops plans, mobilizes, and takes action. To explore the possibilities and barriers in supporting such interactions, we have developed WeDo, a system aimed at promoting simple forms of participatory, end-to-end collective action. Pilot deployments of WeDo illustrate that sociotechnical systems can support automated transitions through different phases of end-to-end collective action, but that challenges, such as the elicitation of leadership and the accommodation of existing group norms, remain.