A Pre-trained Reaction Embedding Descriptor Capturing Bond Transformation Patterns

arXiv:2601.03689v1h-index: 2
Originality Incremental advance
AI Analysis

This work addresses the need for effective reaction descriptors in data-driven chemistry, offering a tool for reaction fingerprinting and analysis, though it is incremental as it builds on existing pre-training methods.

The study tackled the scarcity of general-purpose reaction descriptors by introducing RXNEmb, a novel reaction-level descriptor derived from a pre-trained model, which enabled data-driven re-clustering of the USPTO-50k dataset to better reflect bond-change similarities and provided interpretable insights into reaction mechanisms.

With the rise of data-driven reaction prediction models, effective reaction descriptors are crucial for bridging the gap between real-world chemistry and digital representations. However, general-purpose, reaction-wise descriptors remain scarce. This study introduces RXNEmb, a novel reaction-level descriptor derived from RXNGraphormer, a model pre-trained to distinguish real reactions from fictitious ones with erroneous bond changes, thereby learning intrinsic bond formation and cleavage patterns. We demonstrate its utility by data-driven re-clustering of the USPTO-50k dataset, yielding a classification that more directly reflects bond-change similarities than rule-based categories. Combined with dimensionality reduction, RXNEmb enables visualization of reaction space diversity. Furthermore, attention weight analysis reveals the model's focus on chemically critical sites, providing mechanistic insight. RXNEmb serves as a powerful, interpretable tool for reaction fingerprinting and analysis, paving the way for more data-centric approaches in reaction analysis and discovery.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes