Token-Budget-Aware LLM Reasoning
This addresses efficiency and cost issues for users of LLMs in reasoning tasks, but it is incremental as it builds on existing CoT methods.
The paper tackles the problem of high token usage in Chain-of-Thought reasoning for large language models by proposing a token-budget-aware framework that dynamically adjusts reasoning tokens based on problem complexity, reducing token costs with only a slight performance reduction.
Reasoning is critical for large language models (LLMs) to excel in a wide range of tasks. While methods like Chain-of-Thought (CoT) reasoning and enhance LLM performance by decomposing problems into intermediate steps, they also incur significant overhead in token usage, leading to increased costs. We find that the reasoning process of current LLMs is unnecessarily lengthy and it can be compressed by including a reasonable token budget in the prompt, but the choice of token budget plays a crucial role in the actual compression effectiveness. We then propose a token-budget-aware LLM reasoning framework that dynamically adjusts the number of reasoning tokens based on the reasoning complexity of each problem. Experiments show that our method effectively reduces token costs in CoT reasoning with only a slight performance reduction, offering a practical solution to balance efficiency and accuracy in LLM reasoning. Code: https://github.com/GeniusHTX/TALE