Protect$^*$: Steerable Retrosynthesis through Neuro-Symbolic State Encoding

Shreyas Vinaya Sathyanarayana, Shah Rahil Kirankumar, Sharanabasava D. Hiremath, Bharath Ramsundar

arXiv:2602.13419v11.2h-index: 20

Originality Incremental advance

AI Analysis

This addresses the challenge of fine-grained control in retrosynthesis for chemists, though it appears incremental as it builds on existing LLM and symbolic methods.

The authors tackled the problem of controlling LLMs in retrosynthesis to avoid chemically sensitive sites, introducing Protect$^*$, a neuro-symbolic framework that combines rule-based reasoning with neural models, resulting in reliable, expert-level autonomy demonstrated through case studies like a novel pathway for Erythromycin B.

Large Language Models (LLMs) have shown remarkable potential in scientific domains like retrosynthesis; yet, they often lack the fine-grained control necessary to navigate complex problem spaces without error. A critical challenge is directing an LLM to avoid specific, chemically sensitive sites on a molecule - a task where unconstrained generation can lead to invalid or undesirable synthetic pathways. In this work, we introduce Protect$^*$, a neuro-symbolic framework that grounds the generative capabilities of Large Language Models (LLMs) in rigorous chemical logic. Our approach combines automated rule-based reasoning - using a comprehensive database of 55+ SMARTS patterns and 40+ characterized protecting groups - with the generative intuition of neural models. The system operates via a hybrid architecture: an ``automatic mode'' where symbolic logic deterministically identifies and guards reactive sites, and a ``human-in-the-loop mode'' that integrates expert strategic constraints. Through ``active state tracking,'' we inject hard symbolic constraints into the neural inference process via a dedicated protection state linked to canonical atom maps. We demonstrate this neuro-symbolic approach through case studies on complex natural products, including the discovery of a novel synthetic pathway for Erythromycin B, showing that grounding neural generation in symbolic logic enables reliable, expert-level autonomy.

View on arXiv PDF

Similar