CL AIJan 22, 2024

Distilling Mathematical Reasoning Capabilities into Small Language Models

Xunyu Zhu, Jian Li, Yong Liu, Can Ma, Weiping Wang

arXiv:2401.11864v511.229 citationsh-index: 14Neural Networks

Originality Highly original

AI Analysis

This work democratizes advanced AI by enabling efficient mathematical reasoning in resource-constrained settings, though it is incremental in building on existing distillation methods.

The authors tackled the challenge of compressing advanced Large Language Models' mathematical reasoning capabilities into smaller models without performance loss, achieving state-of-the-art reasoning performance through novel distillation techniques.

This work addresses the challenge of democratizing advanced Large Language Models (LLMs) by compressing their mathematical reasoning capabilities into sub-billion parameter Small Language Models (SLMs) without compromising performance. We introduce Equation-of-Thought Distillation (EoTD), a novel technique that encapsulates the reasoning process into equation-based representations to construct an EoTD dataset for fine-tuning SLMs. Additionally, we propose the Ensemble Thoughts Distillation (ETD) framework to enhance the reasoning performance of SLMs. This involves creating a reasoning dataset with multiple thought processes, including Chain-of-Thought (CoT), Program-of-Thought (PoT), and Equation-of-Thought (EoT), and using it for fine-tuning. Our experimental performance demonstrates that EoTD significantly boosts the reasoning abilities of SLMs, while ETD enables these models to achieve state-of-the-art reasoning performance.

View on arXiv PDF

Similar