Towards Less Biased Data-driven Scoring with Deep Learning-Based End-to-end Database Search in Tandem Mass Spectrometry
This addresses the issue of biased and heuristic scoring in proteomics for researchers, offering a novel data-driven approach that is incremental in improving accuracy and handling modifications.
The paper tackled the problem of biased scoring in peptide identification for mass spectrometry-based proteomics by introducing DeepSearch, a deep learning-based end-to-end database search method, which reduced bias and eliminated the need for statistical estimation, achieving competitive identification rates across diverse datasets.
Peptide identification in mass spectrometry-based proteomics is crucial for understanding protein function and dynamics. Traditional database search methods, though widely used, rely on heuristic scoring functions and statistical estimations have to be introduced for a higher identification rate. Here, we introduce DeepSearch, the first deep learning-based end-to-end database search method for tandem mass spectrometry. DeepSearch leverages a modified transformer-based encoder-decoder architecture under the contrastive learning framework. Unlike conventional methods that rely on ion-to-ion matching, DeepSearch adopts a data-driven approach to score peptide spectrum matches. DeepSearch is also the first deep learning-based method that can profile variable post-translational modifications in a zero-shot manner. We showed that DeepSearch's scoring scheme expressed less bias and did not require any statistical estimation. We validated DeepSearch's accuracy and robustness across various datasets, including those from species with diverse protein compositions and a modification-enriched dataset. DeepSearch sheds new light on database search methods in tandem mass spectrometry.