CL AIOct 7, 2025

Mission Impossible: Feedback-Guided Dynamic Interactive Planning for Improving Reasoning on LLMs

arXiv:2510.05577v14.91 citationsh-index: 2

Originality Incremental advance

AI Analysis

This addresses the challenge of handling open-domain problems requiring massive information retrieval for language agents, representing a strong specific gain rather than a foundational advancement.

The paper tackles the problem of open-domain multi-hop reasoning where existing approaches struggle due to fixed action sequences, and proposes the FGDIP framework that uses dynamic strategies to improve reasoning in LLMs, achieving up to 54.47% F1 on HotpotQA and 70.05% on StrategyQA with gains of 5.03% and 7.25% over baselines.

Recent advancements in language agents have led to significant improvements in multi-hop reasoning tasks. However, existing approaches often struggle with handling open-domain problems, which require massive information retrieval due to their reliance on a fixed sequence of actions. To address this, we propose Feedback-Guided Dynamic Interactive Planning (FGDIP), a novel framework tailored to enhance reasoning in LLMs by utilizing dynamic and adaptive strategies for information exploration in open-domain multi-hop reasoning tasks. Our approach begins by identifying key entities relevant to the problem, which serve as the initial nodes in the reasoning process. From these initial nodes, we then generate reasoning child nodes with the process being refined through a combination of historical error analysis and real-time feedback, which allows the framework to dynamically adjust and optimize its reasoning strategies. By integrating depth-first search with an innovative node generation technique, our framework adapts based on both prior error paths and concurrently generated nodes at the same hierarchical level. This dynamic strategy effectively expands the search space while ensuring the reasoning process systematically converges toward accurate solutions. Experimental results show that FGDIP achieved up to 54.47% F1 score on the HotpotQA dataset and 70.05% on the StrategyQA dataset, surpassing the best baseline by 5.03% and 7.25% respectively, highlighting its versatility and potential to enhance language agents in multi-hop reasoning tasks.

View on arXiv PDF

Similar