CLAISep 29, 2023

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

Tsinghua
arXiv:2309.17452v4298 citationsh-index: 66Has Code
Originality Highly original
AI Analysis

This addresses the challenge of mathematical problem-solving for AI systems, offering a novel integration approach that is incremental in combining existing methods but achieves state-of-the-art results in open-source models.

The paper tackles the problem of large language models struggling with complex mathematics by proposing ToRA, a series of Tool-integrated Reasoning Agents that combine natural language reasoning with external tools like computation libraries and symbolic solvers, resulting in significant performance improvements, such as ToRA-7B achieving 44.6% accuracy on the MATH dataset, surpassing the best open-source model by 22% absolute, and ToRA-Code-34B exceeding 50% accuracy on MATH, outperforming GPT-4's chain-of-thought results.

Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics. In this paper, we propose ToRA a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems by seamlessly integrating natural language reasoning with the utilization of external tools (e.g., computation libraries and symbolic solvers), thereby amalgamating the analytical prowess of language and the computational efficiency of tools. To train ToRA, we curate interactive tool-use trajectories on mathematical datasets, apply imitation learning on the annotations, and propose output space shaping to further refine models' reasoning behavior. As a result, ToRA models significantly outperform open-source models on 10 mathematical reasoning datasets across all scales with 13%-19% absolute improvements on average. Notably, ToRA-7B reaches 44.6% on the competition-level dataset MATH, surpassing the best open-source model WizardMath-70B by 22% absolute. ToRA-Code-34B is also the first open-source model that achieves an accuracy exceeding 50% on MATH, which significantly outperforms GPT-4's CoT result, and is competitive with GPT-4 solving problems with programs. Additionally, we conduct a comprehensive analysis of the benefits and remaining challenges of tool interaction for mathematical reasoning, providing valuable insights for future research.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes