LGCHEM-PHApr 30

CompleteRXN: Toward Completing Open Chemical Reaction Databases

arXiv:2605.0022234.4
Predicted impact top 69% in LG · last 90 daysOriginality Incremental advance
AI Analysis

This addresses the problem of incomplete reaction data for chemoinformatics practitioners, but the gap between benchmark and real-world performance indicates incremental progress.

CompleteRXN introduces a benchmark and models for completing missing byproducts, co-reactants, and stoichiometric coefficients in chemical reaction databases. Their Constrained Reaction Balancer achieves 99.20% accuracy on random splits and 91.12% on extreme out-of-distribution splits, but performance drops on uncurated data.

Chemical reaction datasets such as USPTO suffer from substantial incompleteness, frequently missing byproducts, co-reactants, and stoichiometric coefficients. This limits their applicability and reliability in downstream applications. Here, we introduce CompleteRXN, a large-scale supervised benchmark for reaction completion under realistic missing-data conditions. We construct a dataset of aligned incomplete and atom-balanced reactions by mapping USPTO records to curated mechanistic reactions. We evaluate representative baselines, including a novel encoder-decoder reaction completion model with constrained decoding, the Constrained Reaction Balancer (CRB), and a recent algorithmic method, SynRBL. On our CompleteRXN benchmark, the CRB achieves high performance across splits of increasing difficulty, reaching 99.20% equivalence accuracy on the random split and 91.12% on the extreme out-of-distribution split. SynRBL produces many balanced and chemically plausible completions, but with lower accuracy on the benchmark test splits. Across all methods, performance degrades with increasing incompleteness. We observe a substantial drop when evaluating on reactions outside the benchmark (full uncurated USPTO), highlighting the gap between benchmark performance and practical robustness and motivating future work.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes