CLMAOct 5, 2023

LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models

arXiv:2310.03903v360 citationsh-index: 9Has Code
Originality Incremental advance
AI Analysis

It addresses the problem of assessing multi-agent coordination abilities in LLMs for AI researchers, providing a benchmark to identify strengths and weaknesses, though it is incremental as it builds on existing LLM evaluation frameworks.

This study introduced the LLM-Coordination Benchmark to evaluate large language models (LLMs) in pure coordination settings, finding that LLM agents excel in scenarios relying on environmental variables but struggle with tasks requiring theory of mind reasoning and joint planning, while showing robustness to unseen partners in zero-shot coordination.

Large Language Models (LLMs) have demonstrated emergent common-sense reasoning and Theory of Mind (ToM) capabilities, making them promising candidates for developing coordination agents. This study introduces the LLM-Coordination Benchmark, a novel benchmark for analyzing LLMs in the context of Pure Coordination Settings, where agents must cooperate to maximize gains. Our benchmark evaluates LLMs through two distinct tasks. The first is Agentic Coordination, where LLMs act as proactive participants in four pure coordination games. The second is Coordination Question Answering (CoordQA), which tests LLMs on 198 multiple-choice questions across these games to evaluate three key abilities: Environment Comprehension, ToM Reasoning, and Joint Planning. Results from Agentic Coordination experiments reveal that LLM-Agents excel in multi-agent coordination settings where decision-making primarily relies on environmental variables but face challenges in scenarios requiring active consideration of partners' beliefs and intentions. The CoordQA experiments further highlight significant room for improvement in LLMs' Theory of Mind reasoning and joint planning capabilities. Zero-Shot Coordination (ZSC) experiments in the Agentic Coordination setting demonstrate that LLM agents, unlike RL methods, exhibit robustness to unseen partners. These findings indicate the potential of LLMs as Agents in pure coordination setups and underscore areas for improvement. Code Available at https://github.com/eric-ai-lab/llm_coordination.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes