AILGMAOct 29, 2023

Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game

arXiv:2310.18940v4154 citationsh-index: 12
Originality Incremental advance
AI Analysis

This work addresses the challenge of developing strategic language agents for complex social deduction games, offering an incremental improvement over pure LLM-based methods.

The paper tackles the problem of intrinsic bias in large language model (LLM)-based agents for complex decision-making tasks by proposing a reinforcement learning (RL) framework to enhance strategic play, resulting in agents that outperform existing LLM-based agents and achieve human-level performance in the Werewolf game.

Agents built with large language models (LLMs) have shown great potential across a wide range of domains. However, in complex decision-making tasks, pure LLM-based agents tend to exhibit intrinsic bias in their choice of actions, which is inherited from the model's training data and results in suboptimal performance. To develop strategic language agents, i.e., agents that generate flexible language actions and possess strong decision-making abilities, we propose a novel framework that powers LLM-based agents with reinforcement learning (RL). We consider Werewolf, a popular social deduction game, as a challenging testbed that emphasizes versatile communication and strategic gameplay. To mitigate the intrinsic bias in language actions, our agents use an LLM to perform deductive reasoning and generate a diverse set of action candidates. Then an RL policy trained to optimize the decision-making ability chooses an action from the candidates to play in the game. Extensive experiments show that our agents overcome the intrinsic bias and outperform existing LLM-based agents in the Werewolf game. We also conduct human-agent experiments and find that our agents achieve human-level performance and demonstrate strong strategic play.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes