LGMLJun 7, 2024

A survey and benchmark of high-dimensional Bayesian optimization of discrete sequences

arXiv:2406.04739v226 citations
AI Analysis

This work addresses the need for standardized evaluation and replicability in discrete black-box optimization, which is crucial for practitioners in fields such as protein engineering and drug design, though it is incremental as it builds on existing methods.

The authors tackled the problem of heterogeneous experimental setups and replicability issues in high-dimensional Bayesian optimization for discrete sequences by developing a unified benchmark framework with standardized black-box functions and software libraries, enabling easier testing and application of methods in domains like chemistry and biology.

Optimizing discrete black-box functions is key in several domains, e.g. protein engineering and drug design. Due to the lack of gradient information and the need for sample efficiency, Bayesian optimization is an ideal candidate for these tasks. Several methods for high-dimensional continuous and categorical Bayesian optimization have been proposed recently. However, our survey of the field reveals highly heterogeneous experimental set-ups across methods and technical barriers for the replicability and application of published algorithms to real-world tasks. To address these issues, we develop a unified framework to test a vast array of high-dimensional Bayesian optimization methods and a collection of standardized black-box functions representing real-world application domains in chemistry and biology. These two components of the benchmark are each supported by flexible, scalable, and easily extendable software libraries (poli and poli-baselines), allowing practitioners to readily incorporate new optimization objectives or discrete optimizers. Project website: https://machinelearninglifescience.github.io/hdbo_benchmark

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes