LG CLDec 14, 2023

TinyGSM: achieving >80% on GSM8k with small language models

Bingbin Liu, Sebastien Bubeck, Ronen Eldan, Janardhan Kulkarni, Yuanzhi Li, Anh Nguyen, Rachel Ward, Yi Zhang

arXiv:2312.09241v129.166 citationsh-index: 60

Originality Incremental advance

AI Analysis

This work addresses the computational efficiency challenge for deploying AI in educational or resource-constrained settings by showing small models can match large ones in math reasoning, though it is incremental as it builds on existing dataset generation and verification methods.

The paper tackled the problem of enabling small language models to achieve high accuracy on grade school math by generating a synthetic dataset and using a verifier, resulting in a 1.3B model achieving 81.5% accuracy on GSM8K, outperforming larger models and rivaling GPT-3.5.

Small-scale models offer various computational advantages, and yet to which extent size is critical for problem-solving abilities remains an open question. Specifically for solving grade school math, the smallest model size so far required to break the 80\% barrier on the GSM8K benchmark remains to be 34B. Our work studies how high-quality datasets may be the key for small language models to acquire mathematical reasoning. We introduce \texttt{TinyGSM}, a synthetic dataset of 12.3M grade school math problems paired with Python solutions, generated fully by GPT-3.5. After finetuning on \texttt{TinyGSM}, we find that a duo of a 1.3B generation model and a 1.3B verifier model can achieve 81.5\% accuracy, outperforming existing models that are orders of magnitude larger. This also rivals the performance of the GPT-3.5 ``teacher'' model (77.4\%), from which our model's training data is generated. Our approach is simple and has two key components: 1) the high-quality dataset \texttt{TinyGSM}, 2) the use of a verifier, which selects the final outputs from multiple candidate generations.

View on arXiv PDF

Similar