LGFeb 23

De novo molecular structure elucidation from mass spectra via flow matching

arXiv:2602.19912v1h-index: 29
Originality Highly original
AI Analysis

This addresses a crucial challenge for biological insight, metabolite discovery, and chemical research by enabling more accurate molecular structure elucidation from mass spectra.

The paper tackles the problem of translating mass spectra into full molecular structures, a difficult inverse problem, and achieves state-of-the-art performance by accurately translating up to 45% of molecular mass spectra into their corresponding molecular representations, a fourteen-fold improvement over prior methods.

Mass spectrometry is a powerful and widely used tool for identifying molecular structures due to its sensitivity and ability to profile complex samples. However, translating spectra into full molecular structures is a difficult, under-defined inverse problem. Overcoming this problem is crucial for enabling biological insight, discovering new metabolites, and advancing chemical research across multiple fields. To this end, we develop MSFlow, a two-stage encoder-decoder flow-matching generative model that achieves state-of-the-art performance on the structure elucidation task for small molecules. In the first stage, we adopt a formula-restricted transformer model for encoding mass spectra into a continuous and chemically informative embedding space, while in the second stage, we train a decoder flow matching model to reconstruct molecules from latent embeddings of mass spectra. We present ablation studies demonstrating the importance of using information-preserving molecular descriptors for encoding mass spectra and motivate the use of our discrete flow-based decoder. Our rigorous evaluation demonstrates that MSFlow can accurately translate up to 45 percent of molecular mass spectra into their corresponding molecular representations - an improvement of up to fourteen-fold over the current state-of-the-art. A trained version of MSFlow is made publicly available on GitHub for non-commercial users.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes