CLMar 2

Recursive Think-Answer Process for LLMs and VLMs

arXiv:2603.02099v21 citationsh-index: 11
AI Analysis

This addresses the issue of error vulnerability in reasoning models for AI researchers, though it is incremental as it builds on existing Think-Answer approaches.

The paper tackles the problem of output errors in single-pass inference for Think-Answer reasoners like DeepSeek-R1 by proposing a Recursive Think-Answer Process (R-TAP) that enables iterative reasoning cycles, resulting in models consistently outperforming conventional methods for LLMs and VLMs with significantly fewer self-reflective patterns and faster inference.

Think-Answer reasoners such as DeepSeek-R1 have made notable progress by leveraging interpretable internal reasoning. However, despite the frequent presence of self-reflective cues like "Oops!", they remain vulnerable to output errors during single-pass inference. To address this limitation, we propose an efficient Recursive Think-Answer Process (R-TAP) that enables models to engage in iterative reasoning cycles and generate more accurate answers, going beyond conventional single-pass approaches. Central to this approach is a confidence generator that evaluates the certainty of model responses and guides subsequent improvements. By incorporating two complementary rewards-Recursively Confidence Increase Reward and Final Answer Confidence Reward-we show that R-TAP-enhanced models consistently outperform conventional single-pass methods for both large language models (LLMs) and vision-language models (VLMs). Moreover, by analyzing the frequency of "Oops"-like expressions in model responses, we find that R-TAP-applied models exhibit significantly fewer self-reflective patterns, resulting in more stable and faster inference-time reasoning. We hope R-TAP pave the way evolving into efficient and elaborated methods to refine the reasoning processes of future AI.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes