LG AIMay 14

$f$-Trajectory Balance: A Loss Family for Tuning GFlowNets, Generative Models, and LLMs with Off- and On-Policy Data

arXiv:2605.1541764.2

AI Analysis

This provides a principled framework for designing surrogate losses for generative models that work with both on- and off-policy data, addressing a key limitation in training GFlowNets and related models.

The authors extend the mean square error loss for GFlowNets to a family of f-divergence-based losses that maintain the same global minimizer off-policy while matching f-divergence gradients on-policy. They demonstrate improved mode coverage and off-policy applicability across synthetic tasks, molecule discovery, and LLM tuning.

In GFlowNets and variational inference, it has been shown that the mean square error between target and model log probabilities is an effective, low variance, surrogate loss for training generative models. This loss has the property that when evaluated \emph{on-policy} its gradients correspond to those of the KL divergence, while \emph{off-policy} it remains a valid loss with the same global minimizer. In this work, we demonstrate that this construction can be extended to the whole family of $f$-divergences, leading to a family of losses whose on-policy gradients are that of the corresponding $f$-divergence, but retain the same global minimizer off-policy. Specifically, we show that the on-policy gradients lead to a one to one correspondence between translation invariant loss functions on the target and model log probabilities, and $f$-divergences. This equivalence allows us to design new surrogate loss functions for tuning a wide class of generative models that inherit the properties of the corresponding $f$-divergence, such as being more mode covering, whilst being applicable to off-policy data. We apply our losses on a range of tasks, including classic synthetic examples, SynFlowNets for molecule discovery, and asynchronous large language model (LLM) tuning, demonstrating that our models retain their predicted properties on- and off-policy in a wide class of generative models.

View on arXiv PDF

Similar