LGMay 12

Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation

arXiv:2605.1274197.1
Predicted impact top 2% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For LLM post-training, this framework improves interaction efficiency in rare-success regimes, a known bottleneck for self-distillation methods.

RESD enables LLMs to learn from rare successes by transforming failure feedback into corrective supervision via retrospective reflections and a global playbook, outperforming standard self-distillation and achieving faster early-stage improvement than GRPO with 8x fewer samples.

Enabling Large Language Models (LLMs) to continuously improve from environmental interactions is a central challenge in post-training. While on-policy self-distillation offers a promising paradigm, existing methods predominantly treat environmental feedback as a passive conditioning signal. Consequently, they heavily rely on successful demonstrations and struggle to learn in rare-success regimes. To bridge this gap, we introduce Reflection-Enhanced Self-Distillation (RESD), a framework that transforms raw failure feedback into an active source of corrective supervision. Instead of passively appending feedback, RESD interprets failed trajectories by generating retrospective reflections to diagnose local errors, and curates a persistent global playbook to preserve reusable lessons across training steps. The enriched context enables the self-teacher to provide actionable token-level supervision even in the absence of successful rollouts. Empirical evaluations on multiple continual learning tasks demonstrate that RESD substantially outperforms standard self-distillation baselines. Furthermore, RESD achieves significantly faster early-stage improvement than GRPO with $8\times$ samples using only a single rollout per prompt, highlighting its superior interaction efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes