CLMay 23, 2023

Prompt-Based Monte-Carlo Tree Search for Goal-Oriented Dialogue Policy Planning

arXiv:2305.13660v2155 citations
Originality Incremental advance
AI Analysis

This addresses the problem of data scarcity and noisy annotations in goal-oriented dialogue planning for AI systems, offering a training-free approach that is incremental over existing MCTS methods.

The paper tackles goal-oriented dialogue policy planning by introducing GDP-Zero, which uses Open-Loop MCTS with a large language model as a policy prior, value function, user simulator, and system model, eliminating the need for model training. Results show that GDP-Zero's responses are preferred over ChatGPT up to 59.32% of the time and rated more persuasive in interactive evaluations on the PersuasionForGood task.

Planning for goal-oriented dialogue often requires simulating future dialogue interactions and estimating task progress. Many approaches thus consider training neural networks to perform look-ahead search algorithms such as A* search and Monte Carlo Tree Search (MCTS). However, this training often requires abundant annotated data, which creates challenges when faced with noisy annotations or low-resource settings. We introduce GDP-Zero, an approach using Open-Loop MCTS to perform goal-oriented dialogue policy planning without any model training. GDP-Zero prompts a large language model to act as a policy prior, value function, user simulator, and system model during the tree search. We evaluate GDP-Zero on the goal-oriented task PersuasionForGood, and find that its responses are preferred over ChatGPT up to 59.32% of the time, and are rated more persuasive than ChatGPT during interactive evaluations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes