PDFBench: A Benchmark for De novo Protein Design from Function
This addresses the problem of inconsistent and limited evaluation in protein design for researchers in drug discovery and enzyme engineering, but it is incremental as it focuses on benchmarking rather than new design methods.
The authors tackled the lack of a unified evaluation framework for function-guided protein design by introducing PDFBench, a comprehensive benchmark that systematically evaluates eight state-of-the-art models on 16 metrics across two settings, enabling more reliable comparisons and providing key insights for future research.
Function-guided protein design is a crucial task with significant applications in drug discovery and enzyme engineering. However, the field lacks a unified and comprehensive evaluation framework. Current models are assessed using inconsistent and limited subsets of metrics, which prevents fair comparison and a clear understanding of the relationships between different evaluation criteria. To address this gap, we introduce PDFBench, the first comprehensive benchmark for function-guided denovo protein design. Our benchmark systematically evaluates eight state-of-the-art models on 16 metrics across two key settings: description-guided design, for which we repurpose the Mol-Instructions dataset, originally lacking quantitative benchmarking, and keyword-guided design, for which we introduce a new test set, SwissTest, created with a strict datetime cutoff to ensure data integrity. By benchmarking across a wide array of metrics and analyzing their correlations, PDFBench enables more reliable model comparisons and provides key insights to guide future research.