Bridging Chemists and AI: An Expert-Augmented Framework for Interpretable Route Evaluation
For chemists and AI researchers in organic synthesis, this framework provides a more interpretable and accurate route evaluation tool, addressing the oversimplification in existing data-driven methods.
The paper tackles multi-step synthetic route evaluation in organic chemistry by introducing an expert-augmented framework combining machine learning with domain knowledge. The system achieves a Spearman correlation of 0.78 and 60.2% top-1 ranking accuracy, significantly outperforming the previous baseline of 17.5%.
Selecting efficient multi-step synthetic routes is a central challenge in organic synthesis, particularly in medicinal and process chemistry, where route choice directly impacts feasibility, cost, and development efficiency. Data-driven assessment systems often oversimplify the multi-objective nature of synthesis design and rely on proxy datasets, such as patent routes, rather than universally grounded criteria. To address this, we introduce an expert-augmented, data-driven scoring framework that integrates machine learning with chemists' domain knowledge for both numerical and explainable route assessment. A DeepSets-based model is trained using tree edit distance between reference and machine-generated routes, and then fine-tuned with expert evaluations to produce both quantitative scores and interpretable qualitative categories: Good, Plausible, and Bad. The resulting system achieves a Spearman correlation coefficient of 0.78 and a Pearson correlation of 0.77 for category assessment prediction, and 60.2% top-1 ranking accuracy for score prediction, substantially outperforming the previous baseline of 17.5%.