CLMar 27

RASPRef: Retrieval-Augmented Self-Supervised Prompt Refinement for Large Reasoning Models

arXiv:2603.2700814.2h-index: 2

AI Analysis

For practitioners using reasoning-focused LLMs, RASPRef reduces manual prompt engineering effort by automating prompt refinement without human annotations, though improvements are demonstrated only on mathematical reasoning tasks.

RASPRef improves prompt quality for large reasoning models (e.g., DeepSeek R1, OpenAI o1) by iteratively refining prompts using retrieval-augmented self-supervised signals, achieving better performance on GSM8K-style tasks compared to static prompting.

Recent reasoning-focused language models such as DeepSeek R1 and OpenAI o1 have demonstrated strong performance on structured reasoning benchmarks including GSM8K, MATH, and multi-hop question answering tasks. However, their performance remains highly sensitive to prompt formulation, and designing effective prompts is typically a manual and iterative process that does not scale well across tasks or domains. To address this limitation, we introduce Retrieval-Augmented Self-Supervised Prompt Refinement (RASPRef), a framework that improves prompts without requiring human annotations or task-specific supervision. The approach retrieves relevant examples and previously generated reasoning trajectories, and leverages signals such as multi-sample consistency, verifier feedback, and model-generated critiques to iteratively refine the prompt. Unlike prior approaches that focus primarily on improving model outputs, RASPRef directly treats the prompt as the optimization target and improves it through an iterative retrieval-guided refinement process. Experiments on GSM8K-style mathematical reasoning tasks show that retrieval-guided prompting improves performance compared with a static prompting baseline. We further discuss how retrieval quality, trajectory selection, and self-supervised feedback signals may influence the effectiveness of prompt refinement. These findings suggest that prompt design remains a critical factor for reasoning-oriented language models, and that self-improving prompts offer a practical and scalable strategy for improving reasoning performance.

View on arXiv PDF

Similar