AIDec 1, 2025
SynthStrategy: Extracting and Formalizing Latent Strategic Insights from LLMs in Organic ChemistryDaniel Armstrong, Zlatko Jončev, Andres M Bran et al.
Modern computer-assisted synthesis planning (CASP) systems show promises at generating chemically valid reaction steps but struggle to incorporate strategic considerations such as convergent assembly, protecting group minimization, and optimal ring-forming sequences. We introduce a methodology that leverages Large Language Models to distill synthetic knowledge into code. Our system analyzes synthesis routes and translates strategic principles into Python functions representing diverse strategic and tactical rules, such as strategic functional group interconversions and ring construction strategies. By formalizing this knowledge as verifiable code rather than simple heuristics, we create testable, interpretable representations of synthetic strategy. We release the complete codebase and the USPTO-ST dataset -- synthesis routes annotated with strategic tags. This framework unlocks a novel capability for CASP: natural language-based route retrieval, achieving 75\% Top-3 accuracy on our benchmark. We further validate our library through temporal analysis of historical trends and chemically intuitive route clustering that offers more granular partitioning than common previous methods. This work bridges the tactical-strategic divide in CASP, enabling specification, search, and evaluation of routes by strategic criteria rather than structure alone.
AIDec 18, 2025
Synthelite: Chemist-aligned and feasibility-aware synthesis planning with LLMsNguyen Xuan-Vu, Daniel Armstrong, Milena Wehrbach et al.
Computer-aided synthesis planning (CASP) has long been envisioned as a complementary tool for synthetic chemists. However, existing frameworks often lack mechanisms to allow interaction with human experts, limiting their ability to integrate chemists' insights. In this work, we introduce Synthelite, a synthesis planning framework that uses large language models (LLMs) to directly propose retrosynthetic transformations. Synthelite can generate end-to-end synthesis routes by harnessing the intrinsic chemical knowledge and reasoning capabilities of LLMs, while allowing expert intervention through natural language prompts. Our experiments demonstrate that Synthelite can flexibly adapt its planning trajectory to diverse user-specified constraints, achieving up to 95\% success rates in both strategy-constrained and starting-material-constrained synthesis tasks. Additionally, Synthelite exhibits the ability to account for chemical feasibility during route design. We envision Synthelite to be both a useful tool and a step toward a paradigm where LLMs are the central orchestrators of synthesis planning.
AIMar 11, 2025
Chemical reasoning in LLMs unlocks strategy-aware synthesis planning and reaction mechanism elucidationAndres M Bran, Theo A Neukomm, Daniel P Armstrong et al.
While automated chemical tools excel at specific tasks, they have struggled to capture the strategic thinking that characterizes expert chemical reasoning. Here we demonstrate that large language models (LLMs) can serve as powerful tools enabling chemical analysis. When integrated with traditional search algorithms, they enable a new approach to computer-aided synthesis that mirrors human expert thinking. Rather than using LLMs to directly manipulate chemical structures, we leverage their ability to evaluate chemical strategies and guide search algorithms toward chemically meaningful solutions. We demonstrate this paradigm through two fundamental challenges: strategy-aware retrosynthetic planning and mechanism elucidation. In retrosynthetic planning, our system allows chemists to specify desired synthetic strategies in natural language -- from protecting group strategies to global feasibility assessment -- and uses traditional or LLM-guided Monte Carlo Tree Search to find routes that satisfy these constraints. In mechanism elucidation, LLMs guide the search for plausible reaction mechanisms by combining chemical principles with systematic exploration. This approach shows strong performance across diverse chemical tasks, with newer and larger models demonstrating increasingly sophisticated chemical reasoning. Our approach establishes a new paradigm for computer-aided chemistry that combines the strategic understanding of LLMs with the precision of traditional chemical tools, opening possibilities for more intuitive and powerful chemical automation systems.
BMMay 13, 2025
Generative Molecular Design with Steerable and Granular Synthesizability ControlJeff Guo, Víctor Sabanza-Gil, Zlatko Jončev et al.
Synthesizability in small molecule generative design remains a bottleneck. Existing works that do consider synthesizability can output predicted synthesis routes for generated molecules. However, there has been minimal attention in addressing the ease of synthesis and enabling flexibility to incorporate desired reaction constraints. In this work, we propose a small molecule generative design framework that enables steerable and granular synthesizability control. Generated molecules satisfy arbitrary multi-parameter optimization objectives with predicted synthesis routes containing pre-defined allowed reactions, while optionally avoiding others. One can also enforce that all reactions belong to a pre-defined set. We show the capability to mix-and-match these reaction constraints across the most common medicinal chemistry transformations. Next, we show how our framework can be used to valorize industrial byproducts towards de novo optimized molecules. Going further, we demonstrate how granular control over synthesizability constraints can loosely mimic virtual screening of ultra-large make-on-demand libraries. Using only a single GPU, we generate and dock 15k molecules to identify promising candidates in Freedom 4.0 constituting 142B make-on-demand molecules (assessing only 0.00001% of the library). Generated molecules satisfying the reaction constraints have > 90% exact match rate. Lastly, we benchmark our framework against recent synthesizability-constrained generative models and demonstrate the highest sample efficiency even when imposing the additional constraint that all molecules must be synthesizable from a single reaction type. The main theme is demonstrating that a pre-trained generalist molecular generative model can be incentivized to generate property-optimized small molecules under challenging synthesizability constraints through reinforcement learning.
LGDec 5, 2025
Teaching Language Models Mechanistic Explainability Through Arrow-PushingThéo A. Neukomm, Zlatko Jončev, Philippe Schwaller
Chemical reaction mechanisms provide crucial insight into synthesizability, yet current Computer-Assisted Synthesis Planning (CASP) systems lack mechanistic grounding. We introduce a computational framework for teaching language models to predict chemical reaction mechanisms through arrow pushing formalism, a century-old notation that tracks electron flow while respecting conservation laws. We developed MechSMILES, a compact textual format encoding molecular structure and electron flow, and trained language models on four mechanism prediction tasks of increasing complexity using mechanistic reaction datasets, such as mech-USPTO-31k and FlowER. Our models achieve more than 95\% top-3 accuracy on elementary step prediction and scores that surpass 73\% on mech-USPTO-31k, and 93\% on FlowER dataset for the retrieval of complete reaction mechanisms on our hardest task. This mechanistic understanding enables three key applications. First, our models serve as post-hoc validators for CASP systems, filtering chemically implausible transformations. Second, they enable holistic atom-to-atom mapping that tracks all atoms, including hydrogens. Third, they extract catalyst-aware reaction templates that distinguish recycled catalysts from spectator species. By grounding predictions in physically meaningful electron moves that ensure conservation of mass and charge, this work provides a pathway toward more explainable and chemically valid computational synthesis planning, while providing an architecture-agnostic framework for the benchmarking of mechanism prediction.
LGJul 29, 2025
TempRe: Template generation for single and direct multi-step retrosynthesisNguyen Xuan-Vu, Daniel P Armstrong, Zlatko Jončev et al.
Retrosynthesis planning remains a central challenge in molecular discovery due to the vast and complex chemical reaction space. While traditional template-based methods offer tractability, they suffer from poor scalability and limited generalization, and template-free generative approaches risk generating invalid reactions. In this work, we propose TempRe, a generative framework that reformulates template-based approaches as sequence generation, enabling scalable, flexible, and chemically plausible retrosynthesis. We evaluated TempRe across single-step and multi-step retrosynthesis tasks, demonstrating its superiority over both template classification and SMILES-based generation methods. On the PaRoutes multi-step benchmark, TempRe achieves strong top-k route accuracy. Furthermore, we extend TempRe to direct multi-step synthesis route generation, providing a lightweight and efficient alternative to conventional single-step and search-based approaches. These results highlight the potential of template generative modeling as a powerful paradigm in computer-aided synthesis planning.