AICLHCOct 18, 2024

Step Guided Reasoning: Improving Mathematical Reasoning using Guidance Generation and Step Reasoning

arXiv:2410.19817v37 citationsh-index: 2EMNLP
Originality Incremental advance
AI Analysis

This addresses the problem of computational accuracy and data efficiency in mathematical reasoning for LLM users, offering an incremental improvement over existing step-by-step methods.

The paper tackles the challenge of mathematical reasoning in large language models by proposing Step Guided Reasoning, a training-free framework that enhances performance through reflective step-by-step guidance, resulting in Qwen2-72B-Instruct outperforming a math-specific counterpart on MMLU-STEM with scores of 90.9% vs. 87.3% and average math domain improvements from 36.5% to 47.4%.

Mathematical reasoning has been challenging for large language models (LLMs), and the introduction of step-by-step Chain-of-Thought (CoT) inference has significantly advanced the mathematical capabilities of LLMs. However, current approaches either necessitate extensive inference datasets for training or depend on few-shot methods that frequently compromise computational accuracy. To address these fundamental limitations, we propose Step Guided Reasoning, a novel training-free adaptation framework that efficiently equips general-purpose pre-trained language models with enhanced mathematical reasoning capabilities. In this approach, LLMs reflect on small reasoning steps, similar to how humans deliberate and focus attention on what to do next. By incorporating this reflective process into the inference stage, LLMs can effectively guide their reasoning from one step to the next. Through extensive experiments, we demonstrate the significant effect of Step Guided Reasoning in enhancing mathematical performance in state-of-the-art language models -- Qwen2-72B-Instruct outperforms its math-specific counterpart, Qwen2.5-72B-Math-Instruct, on MMLU-STEM with a score of 90.9%, compared to 87.3%. The average scores of Qwen2-7B-Instruct and Qwen2-72B-Instruct increase from 27.1% to 36. 3% and from 36. 5% to 47.4% in the math domain, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes