QGFN: Controllable Greediness with Action Values
This work addresses a specific challenge in energy-based generative methods for combinatorial objects, offering an incremental improvement for researchers in generative modeling and reinforcement learning.
The paper tackled the problem of biasing Generative Flow Networks (GFlowNets) to produce more high-utility samples by combining GFN policies with action-value estimates (Q) to create controllable greedier sampling policies, resulting in improved generation of high-reward samples across tasks without losing diversity.
Generative Flow Networks (GFlowNets; GFNs) are a family of energy-based generative methods for combinatorial objects, capable of generating diverse and high-utility samples. However, consistently biasing GFNs towards producing high-utility samples is non-trivial. In this work, we leverage connections between GFNs and reinforcement learning (RL) and propose to combine the GFN policy with an action-value estimate, $Q$, to create greedier sampling policies which can be controlled by a mixing parameter. We show that several variants of the proposed method, QGFN, are able to improve on the number of high-reward samples generated in a variety of tasks without sacrificing diversity.