BOSS: Bayesian Optimization over String Spaces
This addresses a bottleneck in applying Bayesian optimization to string-based problems, such as automated code generation or molecular design, by providing a more efficient and flexible approach.
The paper tackles the problem of Bayesian optimization over raw strings by developing a method that uses string kernels and genetic algorithms, eliminating the need for computationally intensive latent space projections. Experiments show considerably improved optimization across various constraints, including context-free grammars.
This article develops a Bayesian optimization (BO) method which acts directly over raw strings, proposing the first uses of string kernels and genetic algorithms within BO loops. Recent applications of BO over strings have been hindered by the need to map inputs into a smooth and unconstrained latent space. Learning this projection is computationally and data-intensive. Our approach instead builds a powerful Gaussian process surrogate model based on string kernels, naturally supporting variable length inputs, and performs efficient acquisition function maximization for spaces with syntactical constraints. Experiments demonstrate considerably improved optimization over existing approaches across a broad range of constraints, including the popular setting where syntax is governed by a context-free grammar.