Socially-Weighted Alignment: A Game-Theoretic Framework for Multi-Agent LLM Systems

arXiv:2602.14471v12.31 citationsh-index: 25

Originality Highly original

AI Analysis

This addresses the issue of negative externalities in shared environments for deploying LLM agents, offering a novel framework to improve system-level performance without requiring parameter updates or multi-agent reinforcement learning.

The paper tackled the problem of individual alignment versus collective stability in multi-agent LLM systems, proposing Socially-Weighted Alignment (SWA) to modify inference-time decisions, which induces a critical threshold λ*=(n-β)/(n-1) that transitions systems from persistent congestion to stable operation near capacity.

Deploying large language model (LLM) agents in shared environments introduces a fundamental tension between individual alignment and collective stability: locally rational decisions can impose negative externalities that degrade system-level performance. We propose Socially-Weighted Alignment (SWA), a game-theoretic framework that modifies inference-time decision making by interpolating between an agent's private objective and an estimate of group welfare via a social weight $λ\in[0,1]$. In a shared-resource congestion game with $n$ agents and congestion severity $β$, we show that SWA induces a critical threshold $λ^*=(n-β)/(n-1)$ above which agents no longer have marginal incentive to increase demand under overload, yielding a phase transition from persistent congestion to stable operation near capacity. We further provide an inference-time algorithmic instantiation of SWA that does not require parameter updates or multi-agent reinforcement learning, and use a multi-agent simulation to empirically validate the predicted threshold behavior.

View on arXiv PDF

Similar