CLMar 20

A Training-Free Regeneration Paradigm: Contrastive Reflection Memory Guided Self-Verification and Self-Improvement

arXiv:2603.2044171.0h-index: 6
AI Analysis

This addresses the problem of inefficient and error-prone self-improvement in LLMs for researchers and practitioners, offering a novel approach that is incremental in improving existing methods.

The paper tackles the trade-off between inference efficiency and accuracy in verification-guided self-improvement for large language models by proposing a training-free regeneration paradigm using contrastive Reflection Memory, which outperforms prior methods on nine benchmarks while maintaining low computational cost.

Verification-guided self-improvement has recently emerged as a promising approach to improving the accuracy of large language model (LLM) outputs. However, existing approaches face a trade-off between inference efficiency and accuracy: iterative verification-rectification is computationally expensive and prone to being trapped in faulty reasoning, while best-of-N selection requires extensive sampling without addressing internal model flaws. We propose a training-free regeneration paradigm that leverages an offline-curated contrastive Reflection Memory (RM) to provide corrective guidance, while regenerating from scratch helps break out of faulty reasoning. At inference time, the method performs RM-guided self-verification followed by a single RM-guided regeneration, avoiding both iterative correction and multi-sample selection. We evaluated our method on nine benchmarks that span algorithmic, reasoning, symbolic, and domain-specific tasks in both small- and large-scale LLMs. Experiment results show that our method outperforms prior methods while maintaining low computational cost.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes