LG AIAug 22, 2025

Guiding Diffusion Models with Reinforcement Learning for Stable Molecule Generation

Zhijian Zhou, Junyi An, Zongkai Liu, Yunfei Shi, Xuan Zhang, Fenglei Cao, Chao Qu, Yuan Qi

arXiv:2508.16521v14 citationsh-index: 2Has Code

Originality Highly original

AI Analysis

This addresses the problem of generating stable molecules for computational chemistry and drug discovery, representing an incremental improvement by combining diffusion models with reinforcement learning.

The paper tackles the challenge of generating physically realistic 3D molecular structures by proposing Reinforcement Learning with Physical Feedback (RLPF), which fine-tunes equivariant diffusion models using physics-based rewards. Experiments on QM9 and GEOM-drug datasets show RLPF significantly improves molecular stability compared to existing methods.

Generating physically realistic 3D molecular structures remains a core challenge in molecular generative modeling. While diffusion models equipped with equivariant neural networks have made progress in capturing molecular geometries, they often struggle to produce equilibrium structures that adhere to physical principles such as force field consistency. To bridge this gap, we propose Reinforcement Learning with Physical Feedback (RLPF), a novel framework that extends Denoising Diffusion Policy Optimization to 3D molecular generation. RLPF formulates the task as a Markov decision process and applies proximal policy optimization to fine-tune equivariant diffusion models. Crucially, RLPF introduces reward functions derived from force-field evaluations, providing direct physical feedback to guide the generation toward energetically stable and physically meaningful structures. Experiments on the QM9 and GEOM-drug datasets demonstrate that RLPF significantly improves molecular stability compared to existing methods. These results highlight the value of incorporating physics-based feedback into generative modeling. The code is available at: https://github.com/ZhijianZhou/RLPF/tree/verl_diffusion.

View on arXiv PDF Code

Similar