Social welfare optimisation under institutional reward and punishment

arXiv:2605.3133077.4
AI Analysis

This research provides a welfare-centric framework for designing institutional incentives in multi-agent and AI systems, addressing a gap in existing work that primarily focuses on minimizing cost or maximizing cooperation frequency. It offers concrete analytical insights for designers of such systems.

This paper investigates institutional incentives (rewards and punishments) in social dilemmas to optimize social welfare, defined as total population payoff minus institutional expenditure. It identifies conditions under which welfare maximization leads to non-monotonic incentive levels and shows that optimal incentives are either zero or concentrated around a specific target, providing an algorithm to compute them. The study also derives conditions where rewards outperform punishments in terms of social welfare for a given budget.

Institutional incentives are widely used to promote cooperation among autonomous, self-regarding agents, from human societies to multi-agent and AI systems. Existing work typically treats incentive design as a bi-objective problem: minimise institutional cost while achieving a high long-run frequency of cooperation. Whether such schemes also maximise social welfare - total population payoff net of institutional expenditure - has remained largely unexplored. We develop a welfare-centric framework for institutional incentives in finite, well-mixed populations playing a social dilemma (Donation Game and Public Goods Game), considering both rewards for cooperators and punishments for defectors. For each mechanism, we derive explicit expressions for expected social welfare and characterise how it depends on incentive efficiency and selection intensity. Analytically, we identify parameter regimes where social welfare has a single optimal incentive level and regimes with qualitative phase transitions, in which welfare becomes non-monotonic with multiple local optima. We prove that any welfare-maximising incentive is either zero or concentrated around a simple closed-form target, and we provide an efficient algorithm to compute these optima. Comparing reward and punishment, we further derive close-formed conditions under which reward outperform punishment in terms of social welfare for any given budget. Overall, our results reveal a systematic gap between incentives optimised for cost or cooperation frequency and those that maximise welfare.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes