CLLGOct 7, 2025

Prompt reinforcing for long-term planning of large language models

arXiv:2510.05921v1h-index: 14
Originality Incremental advance
AI Analysis

This addresses the challenge of long-term planning in LLMs for interactive tasks like dialogue systems, offering an incremental improvement through parameter-free prompt optimisation.

The paper tackles the problem of large language models (LLMs) being suboptimal in multi-turn interactions due to incorrect early assumptions and poor goal tracking, by proposing a prompt optimisation framework inspired by reinforcement learning that modifies task instruction prompts. The result shows significant improvement in multi-turn tasks like text-to-SQL and task-oriented dialogue, with the method generalizing across different LLM-based agents.

Large language models (LLMs) have achieved remarkable success in a wide range of natural language processing tasks and can be adapted through prompting. However, they remain suboptimal in multi-turn interactions, often relying on incorrect early assumptions and failing to track user goals over time, which makes such tasks particularly challenging. Prior works in dialogue systems have shown that long-term planning is essential for handling interactive tasks. In this work, we propose a prompt optimisation framework inspired by reinforcement learning, which enables such planning to take place by only modifying the task instruction prompt of the LLM-based agent. By generating turn-by-turn feedback and leveraging experience replay for prompt rewriting, our proposed method shows significant improvement in multi-turn tasks such as text-to-SQL and task-oriented dialogue. Moreover, it generalises across different LLM-based agents and can leverage diverse LLMs as meta-prompting agents. This warrants future research in reinforcement learning-inspired parameter-free optimisation methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes