Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models
This work addresses the challenge of controlling and optimizing synthetic routes in drug discovery, offering a plug-and-play solution that improves performance across various strategies, though it is incremental as it builds on existing methods.
The paper tackles the problem of generating high-quality synthetic routes for molecule synthesis by addressing limitations in existing strategies, such as greedy selection and lack of control over criteria like cost and yield, and proposes a framework using conditional residual energy-based models to enhance route quality, achieving a 2.5% improvement in top-1 accuracy over previous state-of-the-art methods.
Molecule synthesis through machine learning is one of the fundamental problems in drug discovery. Current data-driven strategies employ one-step retrosynthesis models and search algorithms to predict synthetic routes in a top-bottom manner. Despite their effective performance, these strategies face limitations in the molecule synthetic route generation due to a greedy selection of the next molecule set without any lookahead. Furthermore, existing strategies cannot control the generation of synthetic routes based on possible criteria such as material costs, yields, and step count. In this work, we propose a general and principled framework via conditional residual energy-based models (EBMs), that focus on the quality of the entire synthetic route based on the specific criteria. By incorporating an additional energy-based function into our probabilistic model, our proposed algorithm can enhance the quality of the most probable synthetic routes (with higher probabilities) generated by various strategies in a plug-and-play fashion. Extensive experiments demonstrate that our framework can consistently boost performance across various strategies and outperforms previous state-of-the-art top-1 accuracy by a margin of 2.5%. Code is available at https://github.com/SongtaoLiu0823/CREBM.