AI CL GT MAMar 17, 2025

Identifying Cooperative Personalities in Multi-agent Contexts through Personality Steering with Representation Engineering

Kenneth J. K. Ong, Lye Jia Jun, Hieu Minh "Jord" Nguyen, Seong Hah Cho, Natalia Pérez-Campanero Antolín

arXiv:2503.12722v12 citationsh-index: 2

Originality Incremental advance

AI Analysis

This addresses the challenge of aligning AI agents for better coordination in multi-agent systems, though it is incremental as it builds on existing personality and representation engineering concepts.

The paper tackled the problem of poor cooperation among Large Language Models in multi-agent settings by steering personality traits like Agreeableness and Conscientiousness using representation engineering, finding that higher levels of these traits improved cooperation but also increased vulnerability to exploitation.

As Large Language Models (LLMs) gain autonomous capabilities, their coordination in multi-agent settings becomes increasingly important. However, they often struggle with cooperation, leading to suboptimal outcomes. Inspired by Axelrod's Iterated Prisoner's Dilemma (IPD) tournaments, we explore how personality traits influence LLM cooperation. Using representation engineering, we steer Big Five traits (e.g., Agreeableness, Conscientiousness) in LLMs and analyze their impact on IPD decision-making. Our results show that higher Agreeableness and Conscientiousness improve cooperation but increase susceptibility to exploitation, highlighting both the potential and limitations of personality-based steering for aligning AI agents.

View on arXiv PDF

Similar