DPST: De Novo Peptide Sequencing with Amino-Acid-Aware Transformers
This work addresses a domain-specific bottleneck in proteomics by providing a more accurate method for peptide sequencing, which is incremental as it builds on existing transformer-based approaches.
The paper tackles the problem of de novo peptide sequencing from tandem mass spectrometry data by introducing DPST, which uses amino-acid-aware transformers to avoid over-trimming and improve accuracy, achieving a 12% to 19% increase in peptide accuracy over state-of-the-art methods.
De novo peptide sequencing aims to recover amino acid sequences of a peptide from tandem mass spectrometry (MS) data. Existing approaches for de novo analysis enumerate MS evidence for all amino acid classes during inference. It leads to over-trimming on receptive fields of MS data and restricts MS evidence associated with following undecoded amino acids. Our approach, DPST, circumvents these limitations with two key components: (1) A confidence value aggregation encoder to sketch spectrum representations according to amino-acid-based connectivity among MS; (2) A global-local fusion decoder to progressively assimilate contextualized spectrum representations with a predefined preconception of localized MS evidence and amino acid priors. Our components originate from a closed-form solution and selectively attend to informative amino-acid-aware MS representations. Through extensive empirical studies, we demonstrate the superiority of DPST, showing that it outperforms state-of-the-art approaches by a margin of 12% - 19% peptide accuracy.