Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching
This work addresses a bottleneck in discrete data generation for applications like text-to-image generation and multimodal understanding, offering a more accurate and efficient guidance method.
The paper tackles the problem of inaccurate guidance for discrete data generation by proposing a novel framework that derives exact transition rates, enabling efficient single-forward-pass sampling. It demonstrates effectiveness in energy-guided simulations and preference alignment tasks, achieving significant improvements in sampling efficiency.
Guidance provides a simple and effective framework for posterior sampling by steering the generation process towards the desired distribution. When modeling discrete data, existing approaches mostly focus on guidance with the first-order Taylor approximation to improve the sampling efficiency. However, such an approximation is inappropriate in discrete state spaces since the approximation error could be large. A novel guidance framework for discrete data is proposed to address this problem: We derive the exact transition rate for the desired distribution given a learned discrete flow matching model, leading to guidance that only requires a single forward pass in each sampling step, significantly improving efficiency. This unified novel framework is general enough, encompassing existing guidance methods as special cases, and it can also be seamlessly applied to the masked diffusion model. We demonstrate the effectiveness of our proposed guidance on energy-guided simulations and preference alignment on text-to-image generation and multimodal understanding tasks. The code is available through https://github.com/WanZhengyan/Discrete-Guidance-Matching/tree/main.