AICLJul 31, 2025

CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks

Meta AI
arXiv:2507.23751v223 citationsh-index: 34
Originality Incremental advance
AI Analysis

This addresses the challenge of scalable and effective data generation for training LLMs on both reasoning and instruction-following tasks, representing an incremental advance over prior synthetic data methods.

The paper tackles the problem of generating high-quality synthetic training data for LLMs by introducing CoT-Self-Instruct, which uses Chain-of-Thought reasoning and filtering to create examples. The result shows significant performance improvements, such as outperforming existing datasets on MATH500 and other reasoning benchmarks, and surpassing human and standard Self-Instruct data on AlpacaEval 2.0 and Arena-Hard for non-reasoning tasks.

We propose CoT-Self-Instruct, a synthetic data generation method that instructs LLMs to first reason and plan via Chain-of-Thought (CoT) based on given seed tasks, and then generate a new synthetic example of similar quality and complexity. This is followed by a filtering step to select high-quality data using automatic metrics, which are then used for LLM training. In verifiable reasoning, our synthetic data significantly outperforms existing training datasets, such as s1k and OpenMathReasoning, when evaluated on MATH500, AMC23, AIME24, and GPQA-Diamond. For non-verifiable instruction-following tasks, our method surpasses the performance of both human and standard Self-Instruct training data on the AlpacaEval 2.0 and Arena-Hard benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes