LGAIApr 12

PepBenchmark: A Standardized Benchmark for Peptide Machine Learning

arXiv:2604.1053151.31 citationsh-index: 23Has Code
Predicted impact top 42% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For researchers in peptide drug discovery, this benchmark addresses the lack of standardized evaluation, enabling fair comparison and accelerating methodological progress.

PepBenchmark provides the first standardized benchmark for peptide machine learning, unifying datasets, preprocessing, and evaluation protocols across 35 datasets and 4 model families, enabling consistent comparison and advancing peptide drug discovery.

Peptide therapeutics are widely regarded as the "third generation" of drugs, yet progress in peptide Machine Learning (ML) are hindered by the absence of standardized benchmarks. Here we present PepBenchmark, which unifies datasets, preprocessing, and evaluation protocols for peptide drug discovery. PepBenchmark comprises three components: (1) PepBenchData, a well-curated collection comprising 29 canonical-peptide and 6 non-canonical-peptide datasets across 7 groups, systematically covering key aspects of peptide drug development, representing, to the best of our knowledge, the most comprehensive AI-ready dataset resource to date; (2) PepBenchPipeline, a standardized preprocessing pipeline that ensures consistent dataset cleaning, construction, splitting, and feature transformation, mitigating quality issues common in ad hoc pipelines; and (3) PepBenchLeaderboard, a unified evaluation protocol and leaderboard with strong baselines across 4 major methodological families: Fingerprint-based, GNN-based, PLM-based, and SMILES-based models. Together, PepBenchmark provides the first standardized and comparable foundation for peptide drug discovery, facilitating methodological advances and translation into real-world applications. The data and code are publicly available at https://github.com/ZGCI-AI4S-Pep/PepBenchmark/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes