AICLAug 20, 2024

Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search

arXiv:2408.10635v313 citationsh-index: 21
Originality Incremental advance
AI Analysis

This addresses the challenge of improving decision-making in LLMs for complex, multi-turn games, offering a generalizable solution without training data, though it is incremental in combining existing methods.

The paper tackles the problem of LLMs struggling with complex planning and decision-making by introducing STRATEGIST, a framework that combines LLMs for high-level strategy generation with Monte Carlo Tree Search for execution, achieving superior performance over traditional RL and other LLM-based methods in competitive games like GOPS and Avalon.

Traditional reinforcement learning and planning typically requires vast amounts of data and training to develop effective policies. In contrast, large language models (LLMs) exhibit strong generalization and zero-shot capabilities, but struggle with tasks that require detailed planning and decision-making in complex action spaces. We introduce STRATEGIST, a novel approach that integrates the strengths of both methods. Our approach leverages LLMs to search and update high-level strategies (as text), which are then refined and executed by low-level Monte Carlo Tree Search (MCTS). STRATEGIST is a generalizable framework to optimize the strategy through population-based self-play simulations without the need for any training data. We demonstrate the effectiveness of STRATEGIST in learning optimal strategies for competitive, multi-turn games with partial information, including Game of Pure Strategy (GOPS) and multi-agent, hidden-identity discussion games like The Resistance: Avalon. Our results show that agents equipped with STRATEGIST outperform those trained with traditional RL methods, other LLM-based skill acquisition techniques, pre-existing LLM agents across both game environments and achieves comparable performance against human players.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes