Bayesian Sequential Stacking Algorithm for Concurrently Designing Molecules and Synthetic Reaction Networks
This work addresses a critical bottleneck in practical molecular design by enabling concurrent synthesis planning, which could accelerate drug discovery and materials science, though it is incremental as it builds on existing methods for synthetic route design.
The paper tackles the problem of simultaneously designing molecules with desired properties and their synthetic routes, which is challenging due to the large combinatorial space and complex network topologies, and presents a Bayesian sequential Monte Carlo algorithm that shows overwhelming performance in computational efficiency, coverage, and novelty compared to heuristic methods in a drug-like molecule case study.
In the last few years, de novo molecular design using machine learning has made great technical progress but its practical deployment has not been as successful. This is mostly owing to the cost and technical difficulty of synthesizing such computationally designed molecules. To overcome such barriers, various methods for synthetic route design using deep neural networks have been studied intensively in recent years. However, little progress has been made in designing molecules and their synthetic routes simultaneously. Here, we formulate the problem of simultaneously designing molecules with the desired set of properties and their synthetic routes within the framework of Bayesian inference. The design variables consist of a set of reactants in a reaction network and its network topology. The design space is extremely large because it consists of all combinations of purchasable reactants, often in the order of millions or more. In addition, the designed reaction networks can adopt any topology beyond simple multistep linear reaction routes. To solve this hard combinatorial problem, we present a powerful sequential Monte Carlo algorithm that recursively designs a synthetic reaction network by sequentially building up single-step reactions. In a case study of designing drug-like molecules based on commercially available compounds, compared with heuristic combinatorial search methods, the proposed method shows overwhelming performance in terms of computational efficiency and coverage and novelty with respect to existing compounds.