Exploring an LM to generate Prolog Predicates from Mathematics Questions
This addresses a specific bottleneck in AI for mathematics reasoning, but it is incremental as it builds on existing methods like chain-of-thought prompting.
The paper tackled the problem of poor performance of large language models in solving mathematics questions requiring reasoning by fine-tuning LLaMA7B to generate Prolog predicates, resulting in the Prolog generation model outperforming a chain-of-thought baseline, with the model and corpus released publicly.
Recently, there has been a surge in interest in NLP driven by ChatGPT. ChatGPT, a transformer-based generative language model of substantial scale, exhibits versatility in performing various tasks based on natural language. Nevertheless, large language models often exhibit poor performance in solving mathematics questions that require reasoning. Prior research has demonstrated the effectiveness of chain-of-thought prompting in enhancing reasoning capabilities. Now, we aim to investigate whether fine-tuning a model for the generation of Prolog codes, a logic language, and subsequently passing these codes to a compiler can further improve accuracy. Consequently, we employ chain-of-thought to fine-tune LLaMA7B as a baseline model and develop other fine-tuned LLaMA7B models for the generation of Prolog code, Prolog code + chain-of-thought, and chain-of-thought + Prolog code, respectively. The results reveal that the Prolog generation model surpasses the baseline in performance, while the combination generation models do not yield significant improvements. The Prolog corpus based on GSM8K and the correspondingly finetuned Prolog generation model based on LLaMA7B are released to the research community.