LGQMJan 26, 2023

Efficiently predicting high resolution mass spectra with graph neural networks

arXiv:2301.11419v126 citationsh-index: 54
Originality Highly original
AI Analysis

This addresses the primary open problem in computational metabolomics for researchers needing efficient and accurate small molecule identification from mass spectra.

The paper tackles the problem of predicting high-resolution mass spectra for small molecule identification by modeling it as a mapping from molecular graphs to probability distributions over formulas, achieving significantly lower prediction error and orders-of-magnitude faster runtime than state-of-the-art methods.

Identifying a small molecule from its mass spectrum is the primary open problem in computational metabolomics. This is typically cast as information retrieval: an unknown spectrum is matched against spectra predicted computationally from a large database of chemical structures. However, current approaches to spectrum prediction model the output space in ways that force a tradeoff between capturing high resolution mass information and tractable learning. We resolve this tradeoff by casting spectrum prediction as a mapping from an input molecular graph to a probability distribution over molecular formulas. We discover that a large corpus of mass spectra can be closely approximated using a fixed vocabulary constituting only 2% of all observed formulas. This enables efficient spectrum prediction using an architecture similar to graph classification - GrAFF-MS - achieving significantly lower prediction error and orders-of-magnitude faster runtime than state-of-the-art methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes