LGMar 11

Graph-GRPO: Training Graph Flow Models with Reinforcement Learning

Baoheng Zhu, Deyu Bo, Delvin Ce Zhang, Xiao Wang

arXiv:2603.10395v17.83 citationsh-index: 8

Predicted impact top 23% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses a significant problem in graph generation for applications like drug discovery, though it appears incremental by building on existing graph flow models.

The paper tackles the challenge of aligning graph flow models with complex human preferences or task-specific objectives by proposing Graph-GRPO, an online reinforcement learning framework that achieves state-of-the-art performance, such as 95.0% and 97.5% Valid-Unique-Novelty scores on planar and tree datasets with only 50 denoising steps.

Graph generation is a fundamental task with broad applications, such as drug discovery. Recently, discrete flow matching-based graph generation, \aka, graph flow model (GFM), has emerged due to its superior performance and flexible sampling. However, effectively aligning GFMs with complex human preferences or task-specific objectives remains a significant challenge. In this paper, we propose Graph-GRPO, an online reinforcement learning (RL) framework for training GFMs under verifiable rewards. Our method makes two key contributions: (1) We derive an analytical expression for the transition probability of GFMs, replacing the Monte Carlo sampling and enabling fully differentiable rollouts for RL training; (2) We propose a refinement strategy that randomly perturbs specific nodes and edges in a graph, and regenerates them, allowing for localized exploration and self-improvement of generation quality. Extensive experiments on both synthetic and real datasets demonstrate the effectiveness of Graph-GRPO. With only 50 denoising steps, our method achieves 95.0\% and 97.5\% Valid-Unique-Novelty scores on the planar and tree datasets, respectively. Moreover, Graph-GRPO achieves state-of-the-art performance on the molecular optimization tasks, outperforming graph-based and fragment-based RL methods as well as classic genetic algorithms.

View on arXiv PDF

Similar