Beyond Preset Identities: How Agents Form Stances and Boundaries in Generative Societies

Hanzhong Zhang, Siyang Song, Jindong Wang

arXiv:2603.2340689.8h-index: 3Has Code

Predicted impact top 20% in AI · last 90 daysOriginality Incremental advance

AI Analysis

It addresses the fragility of static prompt engineering for dynamic alignment in human-agent hybrid societies, though it is incremental in its methodological approach.

This paper tackles the problem of how large language models form stable stances and negotiate identities in generative societies, finding that agents exhibit innate progressive biases and can shift 90% of neutral agents with rational persuasion, while conflicting provocations induce a 40.0% trust-action decoupling rate in advanced models.

While large language models simulate social behaviors, their capacity for stable stance formation and identity negotiation during complex interventions remains unclear. To overcome the limitations of static evaluations, this paper proposes a novel mixed-methods framework combining computational virtual ethnography with quantitative socio-cognitive profiling. By embedding human researchers into generative multiagent communities, controlled discursive interventions are conducted to trace the evolution of collective cognition. To rigorously measure how agents internalize and react to these specific interventions, this paper formalizes three new metrics: Innate Value Bias (IVB), Persuasion Sensitivity, and Trust-Action Decoupling (TAD). Across multiple representative models, agents exhibit endogenous stances that override preset identities, consistently demonstrating an innate progressive bias (IVB > 0). When aligned with these stances, rational persuasion successfully shifts 90% of neutral agents while maintaining high trust. In contrast, conflicting emotional provocations induce a paradoxical 40.0% TAD rate in advanced models, which hypocritically alter stances despite reporting low trust. Smaller models contrastingly maintain a 0% TAD rate, strictly requiring trust for behavioral shifts. Furthermore, guided by shared stances, agents use language interactions to actively dismantle assigned power hierarchies and reconstruct self organized community boundaries. These findings expose the fragility of static prompt engineering, providing a methodological and quantitative foundation for dynamic alignment in human-agent hybrid societies. The official code is available at: https://github.com/armihia/CMASE-Endogenous-Stances

View on arXiv PDF Code

Similar