CLJan 3, 2025

CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis

arXiv:2501.01668v29 citationsh-index: 11Has CodeACL
Originality Highly original
AI Analysis

This addresses the limitation of existing methods that fail when all candidate responses are incorrect, offering a cost-effective solution for enhancing LLM performance, though it is incremental as it builds on prior inference scaling techniques.

The paper tackles the problem of improving LLM accuracy on complex reasoning tasks by proposing CoT-based Synthesizer, a novel inference scaling strategy that synthesizes answers from multiple flawed candidate responses, achieving gains of 11.8% for Llama3-8B and 10.3% for GPT-4o on the MATH dataset.

Current inference scaling methods, such as Self-consistency and Best-of-N, have proven effective in improving the accuracy of LLMs on complex reasoning tasks. However, these methods rely heavily on the quality of candidate responses and are unable to produce correct answers when all candidates are incorrect. In this paper, we propose a novel inference scaling strategy, CoT-based Synthesizer, which leverages CoT reasoning to synthesize superior answers by analyzing complementary information from multiple candidate responses, even when all candidate responses are flawed. To enable a lightweight and cost-effective implementation, we introduce an automated data generation pipeline that creates diverse training data. This allows smaller LLMs trained on this data to improve the inference accuracy of larger models, including API-based LLMs. Experimental results across four benchmark datasets with seven policy models demonstrate that our method significantly enhances performance, with gains of 11.8% for Llama3-8B and 10.3% for GPT-4o on the MATH dataset. The corresponding training data and code are publicly available on https://github.com/RUCKBReasoning/CoT-based-Synthesizer.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes