CLAIJan 22, 2024

Distilling Mathematical Reasoning Capabilities into Small Language Models

arXiv:2401.11864v529 citationsh-index: 14Neural Networks
Originality Highly original
AI Analysis

This work democratizes advanced AI by enabling efficient mathematical reasoning in resource-constrained settings, though it is incremental in building on existing distillation methods.

The authors tackled the challenge of compressing advanced Large Language Models' mathematical reasoning capabilities into smaller models without performance loss, achieving state-of-the-art reasoning performance through novel distillation techniques.

This work addresses the challenge of democratizing advanced Large Language Models (LLMs) by compressing their mathematical reasoning capabilities into sub-billion parameter Small Language Models (SLMs) without compromising performance. We introduce Equation-of-Thought Distillation (EoTD), a novel technique that encapsulates the reasoning process into equation-based representations to construct an EoTD dataset for fine-tuning SLMs. Additionally, we propose the Ensemble Thoughts Distillation (ETD) framework to enhance the reasoning performance of SLMs. This involves creating a reasoning dataset with multiple thought processes, including Chain-of-Thought (CoT), Program-of-Thought (PoT), and Equation-of-Thought (EoT), and using it for fine-tuning. Our experimental performance demonstrates that EoTD significantly boosts the reasoning abilities of SLMs, while ETD enables these models to achieve state-of-the-art reasoning performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes