QMLGMar 11, 2023

Prefix-Tree Decoding for Predicting Mass Spectra from Molecules

arXiv:2303.06470v327 citationsh-index: 53
Originality Incremental advance
AI Analysis

This work addresses limitations in computational tools for mass spectra prediction, which is important for metabolite discovery in clinical applications, but it appears incremental as it builds on existing encoding-decoding approaches.

The paper tackles the problem of predicting mass spectra from molecules by introducing a method that treats spectra as sets of molecular formulae, using a prefix tree structure for decoding to overcome combinatorial challenges, and reports promising empirical results.

Computational predictions of mass spectra from molecules have enabled the discovery of clinically relevant metabolites. However, such predictive tools are still limited as they occupy one of two extremes, either operating (a) by fragmenting molecules combinatorially with overly rigid constraints on potential rearrangements and poor time complexity or (b) by decoding lossy and nonphysical discretized spectra vectors. In this work, we use a new intermediate strategy for predicting mass spectra from molecules by treating mass spectra as sets of molecular formulae, which are themselves multisets of atoms. After first encoding an input molecular graph, we decode a set of molecular subformulae, each of which specify a predicted peak in the mass spectrum, the intensities of which are predicted by a second model. Our key insight is to overcome the combinatorial possibilities for molecular subformulae by decoding the formula set using a prefix tree structure, atom-type by atom-type, representing a general method for ordered multiset decoding. We show promising empirical results on mass spectra prediction tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes