GN GT LGAug 23, 2025

Integrative Experiments Identify How Punishment Impacts Welfare in Public Goods Games

Mohammed Alsobay, David G. Rand, Duncan J. Watts, Abdullah Almaatouq

arXiv:2508.17151v11 citationsh-index: 5

Originality Incremental advance

AI Analysis

This research addresses the long-standing debate on punishment effectiveness in social cooperation, providing nuanced insights for policymakers and researchers in behavioral economics and social sciences.

The study investigated how punishment affects cooperation and welfare in public goods games, revealing that while punishment consistently increases contributions, its impact on efficiency varies widely from a 43% improvement to a 44% reduction depending on context, with communication and other features being key predictors.

Punishment as a mechanism for promoting cooperation has been studied extensively for more than two decades, but its effectiveness remains a matter of dispute. Here, we examine how punishment's impact varies across cooperative settings through a large-scale integrative experiment. We vary 14 parameters that characterize public goods games, sampling 360 experimental conditions and collecting 147,618 decisions from 7,100 participants. Our results reveal striking heterogeneity in punishment effectiveness: while punishment consistently increases contributions, its impact on payoffs (i.e., efficiency) ranges from dramatically enhancing welfare (up to 43% improvement) to severely undermining it (up to 44% reduction) depending on the cooperative context. To characterize these patterns, we developed models that outperformed human forecasters (laypeople and domain experts) in predicting punishment outcomes in new experiments. Communication emerged as the most predictive feature, followed by contribution framing (opt-out vs. opt-in), contribution type (variable vs. all-or-nothing), game length (number of rounds), peer outcome visibility (whether participants can see others' earnings), and the availability of a reward mechanism. Interestingly, however, most of these features interact to influence punishment effectiveness rather than operating independently. For example, the extent to which longer games increase the effectiveness of punishment depends on whether groups can communicate. Together, our results refocus the debate over punishment from whether or not it "works" to the specific conditions under which it does and does not work. More broadly, our study demonstrates how integrative experiments can be combined with machine learning to uncover generalizable patterns, potentially involving interactions between multiple features, and help generate novel explanations in complex social phenomena.

View on arXiv PDF

Similar