QMLGFeb 23

Regressor-guided Diffusion Model for De Novo Peptide Sequencing with Explicit Mass Control

arXiv:2602.20209v1h-index: 4Has Code
Originality Incremental advance
AI Analysis

This work solves the problem of generating physically plausible peptide sequences for protein discovery, representing a substantial advancement in the domain of computational biology, though it is incremental in applying diffusion models to this specific task.

The paper tackles the problem of de novo peptide sequencing from mass spectra by addressing the inadequate enforcement of mass consistency constraints in existing models, introducing DiffuNovo, a regressor-guided diffusion model that integrates mass control during training and inference. The result is state-of-the-art accuracy and a significant reduction in mass error, producing more physically plausible peptides.

The discovery of novel proteins relies on sensitive protein identification, for which de novo peptide sequencing (DNPS) from mass spectra is a crucial approach. While deep learning has advanced DNPS, existing models inadequately enforce the fundamental mass consistency constraint, that a predicted peptide's mass must match the experimental measured precursor mass. Previous DNPS methods often treat this critical information as a simple input feature or use it in post-processing, leading to numerous implausible predictions that do not adhere to this fundamental physical property. To address this limitation, we introduce DiffuNovo, a novel regressor-guided diffusion model for de novo peptide sequencing that provides explicit peptide-level mass control. Our approach integrates the mass constraint at two critical stages: during training, a novel peptide-level mass loss guides model optimization, while at inference, regressor-based guidance from gradient-based updates in the latent space steers the generation to compel the predicted peptide adheres to the mass constraint. Comprehensive evaluations on established benchmarks demonstrate that DiffuNovo surpasses state-of-the-art methods in DNPS accuracy. Additionally, as the first DNPS model to employ a diffusion model as its core backbone, DiffuNovo leverages the powerful controllability of diffusion architecture and achieves a significant reduction in mass error, thereby producing much more physically plausible peptides. These innovations represent a substantial advancement toward robust and broadly applicable DNPS. The source code is available in the supplementary material.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes