LGMay 23

Representation-Guided Discrete Molecular Graph Retrosynthesis

arXiv:2605.2442842.5
Predicted impact top 59% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For chemoinformatics researchers, this work improves retrosynthesis prediction accuracy and efficiency by incorporating representation guidance from pretrained encoders into diffusion-based graph generators.

The paper introduces Graph-oriented Representation Guidance (GRG) for template-free single-step retrosynthesis, achieving 58.6/77.2/83.4/87.1 top-1/3/5/10 accuracy on USPTO-50k and 15.5 diversity, outperforming the base generator. GRG also reduces training epochs by 35% and wall-clock time by 30% to reach comparable performance.

Stochastic process-based molecular graph generators have become the state of the art for template-free single-step retrosynthesis. However, these models are typically trained only on product-reactant pairs, thereby acquiring chemistry-relevant representations in an indirect and implicit manner. Meanwhile, recent advances in computer vision demonstrate that offering representation guidance to a generator can effectively distill semantics from pretrained encoders into DiTs, substantially improving both convergence and generation quality. Whether similar gains extend to the retrosynthesis task, and what graph-specific design choices can make them work, remains an open question. To address these questions, we conduct a systematic empirical study over a unified design space spanning teacher molecular representations, endpoint and granularity choices, injection depths in the denoiser, correspondence strategies and guidance scheme. Guided by these considerations, we develop Graph-oriented Representation Guidance (GRG), which achieves 58.6 / 77.2 / 83.4 / 87.1 top-1 / 3 / 5 / 10 accuracy on USPTO-50k, while increasing diversity to 15.5, both substantially outperforming the adopted base generator. Notably, GRG consistently improves all top-k metrics in out-of-distribution settings, suggesting that representation guidance facilitates the acquisition of intrinsic chemical semantics. Meanwhile, the introduced representation guidance reduces the number of epochs by 35% and the wall-clock time by 30% to reach comparable performance. In addition, we introduce a simple yet effective representation-similarity-based reranking mechanism, which further improves the top of the ranked list without training an additional verifier.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes