Exploring Mathematical Extrapolation of Large Language Models with Synthetic Data
This addresses the challenge of improving LLM performance on mathematical extrapolation for AI research, but it is incremental as it applies an existing fine-tuning method to a new synthetic dataset.
The paper tackled the problem of LLMs struggling with complex multi-step mathematical reasoning by fine-tuning on synthetic data for an arithmetical puzzle, achieving a zero-shot pass@1 of 0.44 on in-domain data and 0.33-0.35 on out-of-domain tasks.
Large Language Models (LLMs) have shown excellent performance in language understanding, text generation, code synthesis, and many other tasks, while they still struggle in complex multi-step reasoning problems, such as mathematical reasoning. In this paper, through a newly proposed arithmetical puzzle problem, we show that the model can perform well on multi-step reasoning tasks via fine-tuning on high-quality synthetic data. Experimental results with the open-llama-3B model on three different test datasets show that not only the model can reach a zero-shot pass@1 at 0.44 on the in-domain dataset, it also demonstrates certain generalization capabilities on the out-of-domain datasets. Specifically, this paper has designed two out-of-domain datasets in the form of extending the numerical range and the composing components of the arithmetical puzzle problem separately. The fine-tuned models have shown encouraging performance on these two far more difficult tasks with the zero-shot pass@1 at 0.33 and 0.35, respectively.