AI CLJan 8, 2025

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought

Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken

Stanford

arXiv:2501.04682v139.5103 citationsh-index: 24

Originality Incremental advance

AI Analysis

This work addresses the challenge of achieving more human-like, System 2 reasoning in AI, potentially advancing reasoning capabilities for applications in complex problem-solving, though it appears incremental as it builds on existing CoT methods.

The paper tackles the problem of enhancing reasoning in large language models by proposing Meta Chain-of-Thought (Meta-CoT), a framework that models underlying reasoning processes beyond traditional CoT, with empirical evidence from state-of-the-art models and a training pipeline involving instruction tuning and reinforcement learning.

We propose a novel framework, Meta Chain-of-Thought (Meta-CoT), which extends traditional Chain-of-Thought (CoT) by explicitly modeling the underlying reasoning required to arrive at a particular CoT. We present empirical evidence from state-of-the-art models exhibiting behaviors consistent with in-context search, and explore methods for producing Meta-CoT via process supervision, synthetic data generation, and search algorithms. Finally, we outline a concrete pipeline for training a model to produce Meta-CoTs, incorporating instruction tuning with linearized search traces and reinforcement learning post-training. Finally, we discuss open research questions, including scaling laws, verifier roles, and the potential for discovering novel reasoning algorithms. This work provides a theoretical and practical roadmap to enable Meta-CoT in LLMs, paving the way for more powerful and human-like reasoning in artificial intelligence.

View on arXiv PDF

Similar