LGJun 11, 2025

On a few pitfalls in KL divergence gradient estimation for RL

arXiv:2506.09477v121 citationsh-index: 12Has Code
Originality Synthesis-oriented
AI Analysis

This addresses technical pitfalls for RL practitioners working with LLMs, though it is incremental as it focuses on correcting existing methods rather than introducing new ones.

The paper identifies implementation errors in KL divergence gradient estimation for RL training of LLMs, showing that common approaches produce incorrect gradients, and demonstrates the correct implementation method.

We point out a few pitfalls in implementing gradient estimation for KL divergence in RL training for LLM, as seen in a number of open source projects and papers. The first major pitfall is to differentiate through the KL estimate as loss functions to minimize KL divergence. We show that such implementations are generally incorrect and do not produce the desired KL gradient. Secondly, we show that some implementations do not account for the sequential nature of the estimation problem and produce a partial gradient at best. We demonstrate the impact of such issues with illustrative tabular and LLM experiments, and show the correct way to implement the KL gradient.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes