Comparative Analysis of Formula and Structure Prediction from Tandem Mass Spectra
This work addresses the challenge of identifying unknown compounds in mass spectrometry data for researchers in metabolomics and exposomics, but it is incremental as it focuses on comparative evaluation rather than introducing new methods.
The study systematically evaluated state-of-the-art algorithms for predicting chemical formulas and structures from tandem mass spectra, establishing performance baselines and identifying bottlenecks to improve compound identification in metabolomics and exposomics.
Liquid chromatography mass spectrometry (LC-MS)-based metabolomics and exposomics aim to measure detectable small molecules in biological samples. The results facilitate hypothesis-generating discovery of metabolic changes and disease mechanisms and provide information about environmental exposures and their effects on human health. Metabolomics and exposomics are made possible by the high resolving power of LC and high mass measurement accuracy of MS. However, a majority of the signals from such studies still cannot be identified or annotated using conventional library searching because existing spectral libraries are far from covering the vast chemical space captured by LC-MS/MS. To address this challenge and unleash the full potential of metabolomics and exposomics, a number of computational approaches have been developed to predict compounds based on tandem mass spectra. Published assessment of these approaches used different datasets and evaluation. To select prediction workflows for practical applications and identify areas for further improvements, we have carried out a systematic evaluation of the state-of-the-art prediction algorithms. Specifically, the accuracy of formula prediction and structure prediction was evaluated for different types of adducts. The resulting findings have established realistic performance baselines, identified critical bottlenecks, and provided guidance to further improve compound predictions based on MS.