LGDATA-ANFeb 7, 2025

SpecTUS: Spectral Translator for Unknown Structures annotation from EI-MS spectra

arXiv:2502.05114v1h-index: 14Has Code
Originality Incremental advance
AI Analysis

This addresses compound identification in fields like drug detection and forensics, but it is an incremental improvement over existing methods.

The paper tackles the problem of structural annotation of small molecules from low-resolution GC-EI-MS spectra, particularly for compounds not in spectral libraries, and shows that their model outperforms standard database search techniques, achieving perfect reconstruction in 43% of cases with a single suggestion and 65% with 10 suggestions.

Compound identification and structure annotation from mass spectra is a well-established task widely applied in drug detection, criminal forensics, small molecule biomarker discovery and chemical engineering. We propose SpecTUS: Spectral Translator for Unknown Structures, a deep neural model that addresses the task of structural annotation of small molecules from low-resolution gas chromatography electron ionization mass spectra (GC-EI-MS). Our model analyzes the spectra in \textit{de novo} manner -- a direct translation from the spectra into 2D-structural representation. Our approach is particularly useful for analyzing compounds unavailable in spectral libraries. In a rigorous evaluation of our model on the novel structure annotation task across different libraries, we outperformed standard database search techniques by a wide margin. On a held-out testing set, including \numprint{28267} spectra from the NIST database, we show that our model's single suggestion perfectly reconstructs 43\% of the subset's compounds. This single suggestion is strictly better than the candidate of the database hybrid search (common method among practitioners) in 76\% of cases. In a~still affordable scenario of~10 suggestions, perfect reconstruction is achieved in 65\%, and 84\% are better than the hybrid search.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes