AICLDec 14, 2023

Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent

arXiv:2312.08926v22 citationsh-index: 8
Originality Highly original
AI Analysis

This work addresses the problem of improving mathematical reasoning in LLMs for AI and education applications, representing an incremental advancement through a novel agent-based approach.

The paper tackles the challenge of enhancing large language models (LLMs) to solve complex mathematical problems by proposing a zero-shot agent-based framework called PRER, which decomposes reasoning into planning, reasoning, execution, and reflection, resulting in performance improvements such as a 12.3% increase on MiniF2F and a 9.2% increase on MATH datasets.

Large language models (LLMs) face challenges in solving complex mathematical problems that require comprehensive capacities to parse the statements, associate domain knowledge, perform compound logical reasoning, and integrate the intermediate rationales. Tackling all these problems once could be arduous for LLMs, thus leading to confusion in generation. In this work, we explore the potential of enhancing LLMs with agents by meticulous decomposition and modeling of mathematical reasoning process. Specifically, we propose a formal description of the mathematical solving and extend LLMs with an agent-based zero-shot framework named $\bf{P}$lanner-$\bf{R}$easoner-$\bf{E}$xecutor-$\bf{R}$eflector (PRER). We further provide and implement two MathAgents that define the logical forms and inherent relations via a pool of actions in different grains and orientations: MathAgent-M adapts its actions to LLMs, while MathAgent-H aligns with humankind. Experiments on miniF2F and MATH have demonstrated the effectiveness of PRER and proposed MathAgents, achieving an increase of $12.3\%$($53.9\%\xrightarrow{}66.2\%$) on the MiniF2F, $9.2\%$ ($49.8\%\xrightarrow{}59.0\%$) on MATH, and $13.2\%$($23.2\%\xrightarrow{}35.4\%$) for level-5 problems of MATH against GPT-4. Further analytical results provide more insightful perspectives on exploiting the behaviors of LLMs as agents.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes