LGAINEBMQMSep 27, 2024

Optimistic Games for Combinatorial Bayesian Optimization with Application to Protein Design

arXiv:2409.18582v22 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses a bottleneck in fields like drug discovery and protein design by enabling scalable optimization in combinatorial domains, though it is an incremental improvement over existing BO methods.

The paper tackles the problem of Bayesian optimization over large combinatorial spaces, which is infeasible with existing methods, by proposing GameOpt, a game-theoretical approach that efficiently breaks down complexity into individual decision sets. It demonstrates this on protein design, discovering highly active variants quickly on real-world datasets, with each protein having up to 20^X possible configurations.

Bayesian optimization (BO) is a powerful framework to optimize black-box expensive-to-evaluate functions via sequential interactions. In several important problems (e.g. drug discovery, circuit design, neural architecture search, etc.), though, such functions are defined over large $\textit{combinatorial and unstructured}$ spaces. This makes existing BO algorithms not feasible due to the intractable maximization of the acquisition function over these domains. To address this issue, we propose $\textbf{GameOpt}$, a novel game-theoretical approach to combinatorial BO. $\textbf{GameOpt}$ establishes a cooperative game between the different optimization variables, and selects points that are game $\textit{equilibria}$ of an upper confidence bound acquisition function. These are stable configurations from which no variable has an incentive to deviate$-$ analog to local optima in continuous domains. Crucially, this allows us to efficiently break down the complexity of the combinatorial domain into individual decision sets, making $\textbf{GameOpt}$ scalable to large combinatorial spaces. We demonstrate the application of $\textbf{GameOpt}$ to the challenging $\textit{protein design}$ problem and validate its performance on four real-world protein datasets. Each protein can take up to $20^{X}$ possible configurations, where $X$ is the length of a protein, making standard BO methods infeasible. Instead, our approach iteratively selects informative protein configurations and very quickly discovers highly active protein variants compared to other baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes