Reaction Prediction via Interaction Modeling of Symmetric Difference Shingle Sets
This work addresses inconsistent predictions and poor generalization in organic chemistry, representing a novel method for a known bottleneck.
The paper tackled the problem of chemical reaction prediction by addressing sensitivity to input permutations and inadequate modeling of substructural interactions, resulting in ReaDISH, which improved robustness with an average 8.76% increase in R² under permutation perturbations.
Chemical reaction prediction remains a fundamental challenge in organic chemistry, where existing machine learning models face two critical limitations: sensitivity to input permutations (molecule/atom orderings) and inadequate modeling of substructural interactions governing reactivity. These shortcomings lead to inconsistent predictions and poor generalization to real-world scenarios. To address these challenges, we propose ReaDISH, a novel reaction prediction model that learns permutation-invariant representations while incorporating interaction-aware features. It introduces two innovations: (1) symmetric difference shingle encoding, which extends the differential reaction fingerprint (DRFP) by representing shingles as continuous high-dimensional embeddings, capturing structural changes while eliminating order sensitivity; and (2) geometry-structure interaction attention, a mechanism that models intra- and inter-molecular interactions at the shingle level. Extensive experiments demonstrate that ReaDISH improves reaction prediction performance across diverse benchmarks. It shows enhanced robustness with an average improvement of 8.76% on R$^2$ under permutation perturbations.