Emergent social conventions and collective bias in LLM populations
This addresses the problem of ensuring AI systems align with human values by understanding emergent social dynamics, though it is incremental in exploring LLM-based simulations.
The study demonstrated that decentralized populations of LLM agents can spontaneously develop universally adopted social conventions, with collective biases emerging even from individually unbiased agents, and showed that committed adversarial minorities can impose alternative conventions on the larger population.
Social conventions are the backbone of social coordination, shaping how individuals form a group. As growing populations of artificial intelligence (AI) agents communicate through natural language, a fundamental question is whether they can bootstrap the foundations of a society. Here, we present experimental results that demonstrate the spontaneous emergence of universally adopted social conventions in decentralized populations of large language model (LLM) agents. We then show how strong collective biases can emerge during this process, even when agents exhibit no bias individually. Last, we examine how committed minority groups of adversarial LLM agents can drive social change by imposing alternative social conventions on the larger population. Our results show that AI systems can autonomously develop social conventions without explicit programming and have implications for designing AI systems that align, and remain aligned, with human values and societal goals.