RetroGFN: Diverse and Feasible Retrosynthesis using GFlowNets
This work addresses the challenge of verifying reaction feasibility in molecular discovery, which is crucial for chemists and drug developers, though it is incremental in improving existing retrosynthesis methods.
The paper tackles the problem of single-step retrosynthesis by proposing RetroGFN, a model that explores beyond limited datasets to generate diverse and feasible reactions, achieving competitive top-k accuracy and outperforming existing methods on round-trip accuracy.
Single-step retrosynthesis aims to predict a set of reactions that lead to the creation of a target molecule, which is a crucial task in molecular discovery. Although a target molecule can often be synthesized with multiple different reactions, it is not clear how to verify the feasibility of a reaction, because the available datasets cover only a tiny fraction of the possible solutions. Consequently, the existing models are not encouraged to explore the space of possible reactions sufficiently. In this paper, we propose a novel single-step retrosynthesis model, RetroGFN, that can explore outside the limited dataset and return a diverse set of feasible reactions by leveraging a feasibility proxy model during the training. We show that RetroGFN achieves competitive results on standard top-k accuracy while outperforming existing methods on round-trip accuracy. Moreover, we provide empirical arguments in favor of using round-trip accuracy, which expands the notion of feasibility with respect to the standard top-k accuracy metric.