MAAICLCYLGJun 3, 2025

MAEBE: Multi-Agent Emergent Behavior Framework

arXiv:2506.03053v27 citationsh-index: 25
Originality Incremental advance
AI Analysis

This addresses safety and alignment challenges for AI researchers and developers working with multi-agent systems, though it is incremental as it builds on existing benchmarks and techniques.

The paper tackles the problem of insufficient AI safety evaluations for multi-agent AI ensembles by introducing the MAEBE framework, demonstrating that LLM moral preferences are brittle and shift with framing, and that ensemble behavior is unpredictable due to emergent group dynamics like peer pressure.

Traditional AI safety evaluations on isolated LLMs are insufficient as multi-agent AI ensembles become prevalent, introducing novel emergent risks. This paper introduces the Multi-Agent Emergent Behavior Evaluation (MAEBE) framework to systematically assess such risks. Using MAEBE with the Greatest Good Benchmark (and a novel double-inversion question technique), we demonstrate that: (1) LLM moral preferences, particularly for Instrumental Harm, are surprisingly brittle and shift significantly with question framing, both in single agents and ensembles. (2) The moral reasoning of LLM ensembles is not directly predictable from isolated agent behavior due to emergent group dynamics. (3) Specifically, ensembles exhibit phenomena like peer pressure influencing convergence, even when guided by a supervisor, highlighting distinct safety and alignment challenges. Our findings underscore the necessity of evaluating AI systems in their interactive, multi-agent contexts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes