AI CLAug 20, 2024

Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search

Jonathan Light, Min Cai, Weiqin Chen, Guanzhi Wang, Xiusi Chen, Wei Cheng, Yisong Yue, Ziniu Hu

arXiv:2408.10635v316.013 citationsh-index: 21

Originality Incremental advance

AI Analysis

This addresses the challenge of improving decision-making in LLMs for complex, multi-turn games, offering a generalizable solution without training data, though it is incremental in combining existing methods.

The paper tackles the problem of LLMs struggling with complex planning and decision-making by introducing STRATEGIST, a framework that combines LLMs for high-level strategy generation with Monte Carlo Tree Search for execution, achieving superior performance over traditional RL and other LLM-based methods in competitive games like GOPS and Avalon.

Traditional reinforcement learning and planning typically requires vast amounts of data and training to develop effective policies. In contrast, large language models (LLMs) exhibit strong generalization and zero-shot capabilities, but struggle with tasks that require detailed planning and decision-making in complex action spaces. We introduce STRATEGIST, a novel approach that integrates the strengths of both methods. Our approach leverages LLMs to search and update high-level strategies (as text), which are then refined and executed by low-level Monte Carlo Tree Search (MCTS). STRATEGIST is a generalizable framework to optimize the strategy through population-based self-play simulations without the need for any training data. We demonstrate the effectiveness of STRATEGIST in learning optimal strategies for competitive, multi-turn games with partial information, including Game of Pure Strategy (GOPS) and multi-agent, hidden-identity discussion games like The Resistance: Avalon. Our results show that agents equipped with STRATEGIST outperform those trained with traditional RL methods, other LLM-based skill acquisition techniques, pre-existing LLM agents across both game environments and achieves comparable performance against human players.

View on arXiv PDF

Similar