SSR: Socratic Self-Refine for Large Language Model Reasoning
This addresses the need for more accurate and interpretable reasoning in LLMs, offering a principled black-box approach for evaluation and understanding, though it appears incremental as it builds on existing self-refinement methods.
The paper tackles the problem of coarse self-verification and self-correction in large language model reasoning by proposing SSR, a framework that decomposes responses into verifiable pairs for step-level refinement, resulting in consistent outperformance over state-of-the-art baselines across five benchmarks and three LLMs.
Large Language Models (LLMs) have demonstrated remarkable reasoning abilities, yet existing test-time frameworks often rely on coarse self-verification and self-correction, limiting their effectiveness on complex tasks. In this paper, we propose Socratic Self-Refine (SSR), a novel framework for fine-grained evaluation and precise refinement of LLM reasoning. Our proposed SSR decomposes model responses into verifiable (sub-question, sub-answer) pairs, enabling step-level confidence estimation through controlled re-solving and self-consistency checks. By pinpointing unreliable steps and iteratively refining them, SSR produces more accurate and interpretable reasoning chains. Empirical results across five reasoning benchmarks and three LLMs show that SSR consistently outperforms state-of-the-art iterative self-refinement baselines. Beyond performance gains, SSR provides a principled black-box approach for evaluating and understanding the internal reasoning processes of LLMs. Code is available at https://github.com/SalesforceAIResearch/socratic-self-refine-reasoning.