CL AIMar 14, 2024

Self-Consistency Boosts Calibration for Math Reasoning

Ante Wang, Linfeng Song, Ye Tian, Baolin Peng, Lifeng Jin, Haitao Mi, Jinsong Su, Dong Yu

arXiv:2403.09849v115.928 citationsHas CodeEMNLP

Originality Incremental advance

AI Analysis

This work addresses calibration issues for developers using LLMs in math reasoning, though it appears incremental as it builds on existing self-consistency techniques.

The paper tackled improving calibration for large language models in math reasoning tasks by designing three off-the-shelf methods based on self-consistency, resulting in better bridging of model confidence and accuracy than existing methods on benchmarks like GSM8K and MathQA using models such as Mistral and LLaMA2.

Calibration, which establishes the correlation between accuracy and model confidence, is important for LLM development. We design three off-the-shelf calibration methods based on self-consistency (Wang et al., 2022) for math reasoning tasks. Evaluation on two popular benchmarks (GSM8K and MathQA) using strong open-source LLMs (Mistral and LLaMA2), our methods better bridge model confidence and accuracy than existing methods based on p(True) (Kadavath et al., 2022) or logit (Kadavath et al., 2022).

View on arXiv PDF

Similar