QMAIFeb 17, 2024

Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry

arXiv:2402.11363v31 citationsh-index: 3Has Code
Originality Incremental advance
AI Analysis

This work addresses a critical bottleneck in proteomics for researchers by enabling more accurate peptide identification from DIA data, though it is incremental as it builds on existing deep learning approaches.

The paper tackles the challenge of de novo peptide sequencing from Data-Independent Acquisition (DIA) mass spectrometry data, which is complex due to high multiplexing, and introduces DiaTrans, a transformer-based model that significantly improves precision and recall over existing methods, with gains of up to 34.8% in precision and 31.94% in recall at the amino acid level.

Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce DiaTrans, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Casanovo-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our DiaTrans model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Casanovo-DIA is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/DiaTrans.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes