Guiding Language Model Reasoning with Planning Tokens
This addresses the need for more structural reasoning in LLMs for tasks like math and QA, though it is incremental as it builds on existing chain-of-thought methods.
The paper tackles the problem of enhancing large language models' reasoning by introducing a hierarchical generation scheme with planning tokens to structure chain-of-thought steps, resulting in notable accuracy improvements on math word problem and multihop QA datasets.
Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as chain-of-thought (CoT) reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. To encourage a more structural generation of CoT steps, we propose a hierarchical generation scheme: we let the LM generate a planning token at the start of each reasoning step, intuitively serving as a high-level plan of the current step, and add their embeddings to the model parameters. Our approach requires a negligible increase in trainable parameters (0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme. We demonstrate our method's effectiveness by applying it to three different LLMs, showing notable accuracy improvements across three math word problem datasets and one multihop QA dataset with respect to standard fine-tuning baselines.