AIMay 22, 2025

ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection

arXiv:2505.16475v110 citationsh-index: 6Has CodeACL
Originality Highly original
AI Analysis

This work addresses the challenge of enhancing reasoning capabilities in SLMs without relying on distillation from superior models or fine-grained human annotation, offering a scalable approach for continuous improvement.

The paper tackles the problem of improving meta introspection and reasoning abilities in small language models (SLMs) by introducing ReflectEvo, a pipeline for iterative self-reflection learning, resulting in substantial performance boosts such as increasing Llama-3 from 52.4% to 71.2% and Mistral from 44.4% to 71.1% on benchmarks.

We present a novel pipeline, ReflectEvo, to demonstrate that small language models (SLMs) can enhance meta introspection through reflection learning. This process iteratively generates self-reflection for self-training, fostering a continuous and self-evolving process. Leveraging this pipeline, we construct ReflectEvo-460k, a large-scale, comprehensive, self-generated reflection dataset with broadened instructions and diverse multi-domain tasks. Building upon this dataset, we demonstrate the effectiveness of reflection learning to improve SLMs' reasoning abilities using SFT and DPO with remarkable performance, substantially boosting Llama-3 from 52.4% to 71.2% and Mistral from 44.4% to 71.1%. It validates that ReflectEvo can rival or even surpass the reasoning capability of the three prominent open-sourced models on BIG-bench without distillation from superior models or fine-grained human annotation. We further conduct a deeper analysis of the high quality of self-generated reflections and their impact on error localization and correction. Our work highlights the potential of continuously enhancing the reasoning performance of SLMs through iterative reflection learning in the long run.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes