AIMay 15, 2025

Pre-Act: Multi-Step Planning and Reasoning Improves Acting in LLM Agents

arXiv:2505.09970v216 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing agent performance in task-oriented systems, particularly for practical applications with smaller models, though it is incremental as it builds on existing ReAct capabilities.

The paper tackles the problem of improving reasoning and action performance in LLM-based agents by introducing Pre-Act, a multi-step planning approach that refines execution plans incrementally, resulting in a 70% improvement in Action Recall over ReAct on the Almita dataset and fine-tuned models achieving up to 69.5% better action accuracy and 28% higher goal completion rates.

The ReAct (Reasoning + Action) capability in large language models (LLMs) has become the foundation of modern agentic systems. Recent LLMs, such as DeepSeek-R1 and OpenAI o1/o3, exemplify this by emphasizing reasoning through the generation of ample intermediate tokens, which help build a strong premise before producing the final output tokens. In this paper, we introduce Pre-Act, a novel approach that enhances the agent's performance by creating a multi-step execution plan along with the detailed reasoning for the given user input. This plan incrementally incorporates previous steps and tool outputs, refining itself after each step execution until the final response is obtained. Our approach is applicable to both conversational and non-conversational agents. To measure the performance of task-oriented agents comprehensively, we propose a two-level evaluation framework: (1) turn level and (2) end-to-end. Our turn-level evaluation, averaged across five models, shows that our approach, Pre-Act, outperforms ReAct by 70% in Action Recall on the Almita dataset. While this approach is effective for larger models, smaller models crucial for practical applications, where latency and cost are key constraints, often struggle with complex reasoning tasks required for agentic systems. To address this limitation, we fine-tune relatively small models such as Llama 3.1 (8B & 70B) using the proposed Pre-Act approach. Our experiments show that the fine-tuned 70B model outperforms GPT-4, achieving a 69.5% improvement in action accuracy (turn-level) and a 28% improvement in goal completion rate (end-to-end) on the Almita (out-of-domain) dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes