Synthesizable Molecular Generation via Soft-constrained GFlowNets with Rich Chemical Priors
This addresses the challenge of practical molecule synthesis in drug discovery, offering a more flexible and scalable approach compared to previous hard-constraint methods.
The paper tackled the problem of generating synthesizable molecules for drug discovery by proposing S3-GFN, a method using soft regularization in GFlowNets with chemical priors, achieving over 95% synthesizability and higher rewards in tasks.
The application of generative models for experimental drug discovery campaigns is severely limited by the difficulty of designing molecules de novo that can be synthesized in practice. Previous works have leveraged Generative Flow Networks (GFlowNets) to impose hard synthesizability constraints through the design of state and action spaces based on predefined reaction templates and building blocks. Despite the promising prospects of this approach, it currently lacks flexibility and scalability. As an alternative, we propose S3-GFN, which generates synthesizable SMILES molecules via simple soft regularization of a sequence-based GFlowNet. Our approach leverages rich molecular priors learned from large-scale SMILES corpora to steer molecular generation towards high-reward, synthesizable chemical spaces. The model induces constraints through off-policy replay training with a contrastive learning signal based on separate buffers of synthesizable and unsynthesizable samples. Our experiments show that S3-GFN learns to generate synthesizable molecules ($\geq 95\%$) with higher rewards in diverse tasks.