LGQMMar 31, 2025

Why risk matters for protein binder design

ETH Zurich
arXiv:2504.00146v2h-index: 6
Originality Synthesis-oriented
AI Analysis

This work addresses risk-aware optimization for protein binder design, which is incremental as it applies existing portfolio optimization metrics to a specific domain.

The study compared 72 Bayesian optimization model combinations on 11 protein binder fitness landscapes, focusing on risk and cost constraints, and found Pareto-optimal models on the risk-performance axis with performance varying by landscape properties like epistasis.

Bayesian optimization (BO) has recently become more prevalent in protein engineering applications and hence has become a fruitful target of benchmarks. However, current BO comparisons often overlook real-world considerations like risk and cost constraints. In this work, we compare 72 model combinations of encodings, surrogate models, and acquisition functions on 11 protein binder fitness landscapes, specifically from this perspective. Drawing from the portfolio optimization literature, we adopt metrics to quantify the cold-start performance relative to a random baseline, to assess the risk of an optimization campaign, and to calculate the overall budget required to reach a fitness threshold. Our results suggest the existence of Pareto-optimal models on the risk-performance axis, the shift of this preference depending on the landscape explored, and the robust correlation between landscape properties such as epistasis with the average and worst-case model performance. They also highlight that rigorous model selection requires substantial computational and statistical efforts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes