RGFN: Synthesizable Molecular Generation Using GFlowNets
This addresses the challenge of experimental validation in drug discovery by enabling more synthesizable molecular candidates, though it is an incremental extension of GFlowNets.
The paper tackles the problem of poor synthesizability in machine learning-generated small molecules by proposing Reaction-GFlowNet (RGFN), which operates in chemical reaction space to ensure synthesizability while maintaining candidate quality, achieving a search space orders of magnitude larger than existing libraries with low synthesis cost.
Generative models hold great promise for small molecule discovery, significantly increasing the size of search space compared to traditional in silico screening libraries. However, most existing machine learning methods for small molecule generation suffer from poor synthesizability of candidate compounds, making experimental validation difficult. In this paper we propose Reaction-GFlowNet (RGFN), an extension of the GFlowNet framework that operates directly in the space of chemical reactions, thereby allowing out-of-the-box synthesizability while maintaining comparable quality of generated candidates. We demonstrate that with the proposed set of reactions and building blocks, it is possible to obtain a search space of molecules orders of magnitude larger than existing screening libraries coupled with low cost of synthesis. We also show that the approach scales to very large fragment libraries, further increasing the number of potential molecules. We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.