CLAIApr 17, 2024

Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models

arXiv:2404.11500v142 citationsh-index: 11NAACL
Originality Incremental advance
AI Analysis

This addresses the robustness issue in mathematical reasoning for large language models, though it is incremental as it builds on existing self-consistency methods.

The paper investigates how subtle changes in the wording of mathematical problems affect the solve rates of large language models, revealing sensitivity to surface form, and proposes Self-Consistency-over-Paraphrases (SCoP) to improve performance by diversifying reasoning paths, showing gains over vanilla self-consistency on four benchmarks across three models.

This paper studies the relationship between the surface form of a mathematical problem and its solvability by large language models. We find that subtle alterations in the surface form can significantly impact the answer distribution and the solve rate, exposing the language model's lack of robustness and sensitivity to the surface form in reasoning through complex problems. To improve mathematical reasoning performance, we propose Self-Consistency-over-Paraphrases (SCoP), which diversifies reasoning paths from specific surface forms of the problem. We evaluate our approach on four mathematics reasoning benchmarks over three large language models and show that SCoP improves mathematical reasoning performance over vanilla self-consistency, particularly for problems initially deemed unsolvable. Finally, we provide additional experiments and discussion regarding problem difficulty and surface forms, including cross-model difficulty agreement and paraphrasing transferability, and Variance of Variations (VOV) for language model evaluation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes