CHEM-PHAIFeb 10

NMRTrans: Structure Elucidation from Experimental NMR Spectra via Set Transformers

arXiv:2602.10158v1
Originality Highly original
AI Analysis

This addresses the time-consuming and expertise-dependent challenge of NMR spectroscopy interpretation for chemists, representing a strong specific gain rather than an incremental improvement.

The paper tackles the problem of interpreting experimental NMR spectra for molecular structure elucidation by introducing NMRTrans, a Transformer model trained on a large-scale corpus of experimental spectra, which achieves state-of-the-art performance with a Top-10 Accuracy improvement of +17.82 points (61.15% vs. 43.33%) over the strongest baseline.

Nuclear Magnetic Resonance (NMR) spectroscopy is fundamental for molecular structure elucidation, yet interpreting spectra at scale remains time-consuming and highly expertise-dependent. While recent spectrum-as-language modeling and retrieval-based methods have shown promise, they rely heavily on large corpora of computed spectra and exhibit notable performance drops when applied to experimental measurements. To address these issues, we build NMRSpec, a large-scale corpus of experimental $^1$H and $^{13}$C spectra mined from chemical literature, and propose NMRTrans, which models spectra as unordered peak sets and aligns the model's inductive bias with the physical nature of NMR. To our best knowledge, NMRTrans is the first NMR Transformer trained solely on large-scale experimental spectra and achieves state-of-the-art performance on experimental benchmarks, improving Top-10 Accuracy over the strongest baseline by +17.82 points (61.15% vs. 43.33%), and underscoring the importance of experimental data and structure-aware architectures for reliable NMR structure elucidation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes