AIQMDec 14, 2023

Unbiased organism-agnostic and highly sensitive signal peptide predictor with deep protein language model

arXiv:2312.08987v121 citationsh-index: 14Nat Comput Sci
Originality Highly original
AI Analysis

This work addresses the need for efficient computational tools to analyze signal peptides in large-scale protein sequences, such as in metagenomics, offering a faster alternative to experimental methods.

The authors tackled the problem of predicting signal peptides in proteins by developing USPNet, a deep learning method that uses protein language models and evolutionary information, achieving high sensitivity and organism-agnostic performance.

Signal peptide (SP) is a short peptide located in the N-terminus of proteins. It is essential to target and transfer transmembrane and secreted proteins to correct positions. Compared with traditional experimental methods to identify signal peptides, computational methods are faster and more efficient, which are more practical for analyzing thousands or even millions of protein sequences, especially for metagenomic data. Here we present Unbiased Organism-agnostic Signal Peptide Network (USPNet), a signal peptide classification and cleavage site prediction deep learning method that takes advantage of protein language models. We propose to apply label distribution-aware margin loss to handle data imbalance problems and use evolutionary information of protein to enrich representation and overcome species information dependence.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes