Anton Morgunov

LG
h-index10
4papers
37citations
Novelty55%
AI Score41

4 Papers

LGSep 11, 2023Code
ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation

Gregory W. Kyro, Anton Morgunov, Rafael I. Brent et al.

The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology that requires evaluation of only a subset of the generated data in the constructed sample space to successfully align a generative model with respect to a specified objective. We demonstrate the applicability of this methodology to targeted molecular generation by fine-tuning a GPT-based molecular generator toward a protein with FDA-approved small-molecule inhibitors, c-Abl kinase. Remarkably, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence, and even reproduces two of them exactly. We also show that the methodology is effective for a protein without any commercially available small-molecule inhibitors, the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. We believe that the inherent generality of this method ensures that it will remain applicable as the exciting field of in silico molecular generation evolves. To facilitate implementation and reproducibility, we have made all of our software available through the open-source ChemSpaceAL Python package.

LGDec 8, 2025
Procrustean Bed for AI-Driven Retrosynthesis: A Unified Framework for Reproducible Evaluation

Anton Morgunov, Victor S. Batista

Progress in computer-aided synthesis planning (CASP) is obscured by the lack of standardized evaluation infrastructure and the reliance on metrics that prioritize topological completion over chemical validity. We introduce RetroCast, a unified evaluation suite that standardizes heterogeneous model outputs into a common schema to enable statistically rigorous, apples-to-apples comparison. The framework includes a reproducible benchmarking pipeline with stratified sampling and bootstrapped confidence intervals, accompanied by SynthArena, an interactive platform for qualitative route inspection. We utilize this infrastructure to evaluate leading search-based and sequence-based algorithms on a new suite of standardized benchmarks. Our analysis reveals a divergence between "solvability" (stock-termination rate) and route quality; high solvability scores often mask chemical invalidity or fail to correlate with the reproduction of experimental ground truths. Furthermore, we identify a "complexity cliff" in which search-based methods, despite high solvability rates, exhibit a sharp performance decay in reconstructing long-range synthetic plans compared to sequence-based approaches. We release the full framework, benchmark definitions, and a standardized database of model predictions to support transparent and reproducible development in the field.

LGMay 22, 2024
DirectMultiStep: Direct Route Generation for Multistep Retrosynthesis

Yu Shee, Anton Morgunov, Haote Li et al.

Traditional computer-aided synthesis planning (CASP) methods rely on iterative single-step predictions, leading to exponential search space growth that limits efficiency and scalability. We introduce a series of transformer-based models, that leverage a mixture of experts approach to directly generate multistep synthetic routes as a single string, conditionally predicting each transformation based on all preceding ones. Our DMS Explorer XL model, which requires only target compounds as input, outperforms state-of-the-art methods on the PaRoutes dataset with 1.9x and 3.1x improvements in Top-1 accuracy on the n$_1$ and n$_5$ test sets, respectively. Providing additional information, such as the desired number of steps and starting materials, enables both a reduction in model size and an increase in accuracy, highlighting the benefits of incorporating more constraints into the prediction process. The top-performing DMS-Flex (Duo) model scores 25-50% higher on Top-1 and Top-10 accuracies for both n$_1$ and n$_5$ sets. Additionally, our models successfully predict routes for FDA-approved drugs not included in the training data, demonstrating strong generalization capabilities. While the limited diversity of the training set may affect performance on less common reaction types, our multistep-first approach presents a promising direction towards fully automated retrosynthetic planning.

AISep 18, 2025
FragmentRetro: A Quadratic Retrosynthetic Method Based on Fragmentation Algorithms

Yu Shee, Anthony M. Smaldone, Anton Morgunov et al.

Retrosynthesis, the process of deconstructing a target molecule into simpler precursors, is crucial for computer-aided synthesis planning (CASP). Widely adopted tree-search methods often suffer from exponential computational complexity. In this work, we introduce FragmentRetro, a novel retrosynthetic method that leverages fragmentation algorithms, specifically BRICS and r-BRICS, combined with stock-aware exploration and pattern fingerprint screening to achieve quadratic complexity. FragmentRetro recursively combines molecular fragments and verifies their presence in a building block set, providing sets of fragment combinations as retrosynthetic solutions. We present the first formal computational analysis of retrosynthetic methods, showing that tree search exhibits exponential complexity $O(b^h)$, DirectMultiStep scales as $O(h^6)$, and FragmentRetro achieves $O(h^2)$, where $h$ represents the number of heavy atoms in the target molecule and $b$ is the branching factor for tree search. Evaluations on PaRoutes, USPTO-190, and natural products demonstrate that FragmentRetro achieves high solved rates with competitive runtime, including cases where tree search fails. The method benefits from fingerprint screening, which significantly reduces substructure matching complexity. While FragmentRetro focuses on efficiently identifying fragment-based solutions rather than full reaction pathways, its computational advantages and ability to generate strategic starting candidates establish it as a powerful foundational component for scalable and automated synthesis planning.