LGSep 4, 2025

RL's Razor: Why Online Reinforcement Learning Forgets Less

arXiv:2509.04259v1109 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses catastrophic forgetting for practitioners fine-tuning large models, offering a principled approach to maintain capabilities, though it is incremental as it builds on existing RL methods.

The paper tackles the problem of catastrophic forgetting in fine-tuning by comparing reinforcement learning (RL) and supervised fine-tuning (SFT), finding that RL preserves prior knowledge better due to a bias towards KL-minimal solutions, with experiments showing significant reductions in forgetting.

Comparison of fine-tuning models with reinforcement learning (RL) and supervised fine-tuning (SFT) reveals that, despite similar performance at a new task, RL preserves prior knowledge and capabilities significantly better. We find that the degree of forgetting is determined by the distributional shift, measured as the KL-divergence between the fine-tuned and base policy evaluated on the new task. Our analysis reveals that on-policy RL is implicitly biased towards KL-minimal solutions among the many that solve the new task, whereas SFT can converge to distributions arbitrarily far from the base model. We validate these findings through experiments with large language models and robotic foundation models and further provide theoretical justification for why on-policy RL updates lead to a smaller KL change. We term this principle $\textit{RL's Razor}$: among all ways to solve a new task, RL prefers those closest in KL to the original model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes