CLApr 11, 2024

Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs

arXiv:2404.08148v16 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses the problem of enhancing AI performance on difficult algorithmic tasks for researchers and developers, though it appears incremental as it builds on existing distillation methods.

The paper tackles the challenge of improving reasoning abilities in large language models for complex tasks like competitive programming by distilling algorithmic reasoning from explanations of solution programs, resulting in higher solve rates than chain-of-thought baselines.

Distilling explicit chain-of-thought reasoning paths has emerged as an effective method for improving the reasoning abilities of large language models (LLMs) across various tasks. However, when tackling complex tasks that pose significant challenges for state-of-the-art models, this technique often struggles to produce effective chains of thought that lead to correct answers. In this work, we propose a novel approach to distill reasoning abilities from LLMs by leveraging their capacity to explain solutions. We apply our method to solving competitive-level programming challenges. More specifically, we employ an LLM to generate explanations for a set of <problem, solution-program> pairs, then use <problem, explanation> pairs to fine-tune a smaller language model, which we refer to as the Reasoner, to learn algorithmic reasoning that can generate "how-to-solve" hints for unseen problems. Our experiments demonstrate that learning from explanations enables the Reasoner to more effectively guide program implementation by a Coder, resulting in higher solve rates than strong chain-of-thought baselines on competitive-level programming problems. It also outperforms models that learn directly from <problem, solution-program> pairs. We curated an additional test set in the CodeContests format, which includes 246 more recent problems posted after the models' knowledge cutoff.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes