CLAIMar 14, 2024

Self-Consistency Boosts Calibration for Math Reasoning

arXiv:2403.09849v128 citationsHas CodeEMNLP
Originality Incremental advance
AI Analysis

This work addresses calibration issues for developers using LLMs in math reasoning, though it appears incremental as it builds on existing self-consistency techniques.

The paper tackled improving calibration for large language models in math reasoning tasks by designing three off-the-shelf methods based on self-consistency, resulting in better bridging of model confidence and accuracy than existing methods on benchmarks like GSM8K and MathQA using models such as Mistral and LLaMA2.

Calibration, which establishes the correlation between accuracy and model confidence, is important for LLM development. We design three off-the-shelf calibration methods based on self-consistency (Wang et al., 2022) for math reasoning tasks. Evaluation on two popular benchmarks (GSM8K and MathQA) using strong open-source LLMs (Mistral and LLaMA2), our methods better bridge model confidence and accuracy than existing methods based on p(True) (Kadavath et al., 2022) or logit (Kadavath et al., 2022).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes