CLOct 23, 2023

MCC-KD: Multi-CoT Consistent Knowledge Distillation

arXiv:2310.14747v3146 citationsh-index: 26
Originality Incremental advance
AI Analysis

This work addresses the problem of efficient knowledge distillation for reasoning in smaller models, which is incremental as it builds on existing CoT and distillation methods.

The paper tackles the challenge of transferring reasoning abilities from large to smaller language models by proposing MCC-KD, which enhances diversity and consistency in rationales, resulting in superior performance and robust generalization on both in-distribution and out-of-distribution datasets.

Large language models (LLMs) have showcased remarkable capabilities in complex reasoning through chain of thought (CoT) prompting. Recently, there has been a growing interest in transferring these reasoning abilities from LLMs to smaller models. However, achieving both the diversity and consistency in rationales presents a challenge. In this paper, we focus on enhancing these two aspects and propose Multi-CoT Consistent Knowledge Distillation (MCC-KD) to efficiently distill the reasoning capabilities. In MCC-KD, we generate multiple rationales for each question and enforce consistency among the corresponding predictions by minimizing the bidirectional KL-divergence between the answer distributions. We investigate the effectiveness of MCC-KD with different model architectures (LLaMA/FlanT5) and various model scales (3B/7B/11B/13B) on both mathematical reasoning and commonsense reasoning benchmarks. The empirical results not only confirm MCC-KD's superior performance on in-distribution datasets but also highlight its robust generalization ability on out-of-distribution datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes