AI CLOct 22, 2024

Non-myopic Generation of Language Models for Reasoning and Planning

Chang Ma, Haiteng Zhao, Junlei Zhang, Junxian He, Lingpeng Kong

arXiv:2410.17195v322.024 citationsh-index: 11Has CodeICLR

Originality Incremental advance

AI Analysis

It addresses the challenge of non-myopic planning in LLMs for reasoning and planning tasks, offering an incremental improvement over existing methods.

This paper tackled the problem of LLMs' myopic autoregressive decoding hindering reliable and optimal planning by proposing Predictive-Decoding, a method based on Model Predictive Control that re-weights LLM distributions using foresight trajectories, resulting in significant improvements in math, coding, and agent tasks with computational efficiency.

Large Language Models have demonstrated remarkable abilities in reasoning and planning by breaking down complex problems into sequential steps. Despite their success in various domains like mathematical problem-solving and coding, LLMs face challenges in ensuring reliable and optimal planning due to their inherent myopic nature of autoregressive decoding. This paper revisits LLM reasoning from an optimal-control perspective, proposing a novel method, Predictive-Decoding, that leverages Model Predictive Control to enhance planning accuracy. By re-weighting LLM distributions based on foresight trajectories, Predictive-Decoding aims to mitigate early errors and promote non-myopic planning. Our experiments show significant improvements in a wide range of tasks for math, coding, and agents. Furthermore, Predictive-Decoding demonstrates computational efficiency, outperforming search baselines with reduced computational resources. This study provides insights into optimizing LLM planning capabilities.

View on arXiv PDF Code

Similar