Learning Surrogates for Offline Black-Box Optimization via Gradient Matching
This work addresses the challenge of mitigating performance loss due to imperfect surrogates in offline optimization, with applications in science and engineering, though it is incremental as it builds on existing methods.
The paper tackles the problem of inaccurate surrogate models in offline black-box optimization, which limits performance when optimizing designs in fields like material science, and presents a theoretical framework and algorithm that improve over prior methods on real-world benchmarks.
Offline design optimization problem arises in numerous science and engineering applications including material and chemical design, where expensive online experimentation necessitates the use of in silico surrogate functions to predict and maximize the target objective over candidate designs. Although these surrogates can be learned from offline data, their predictions are often inaccurate outside the offline data regime. This challenge raises a fundamental question about the impact of imperfect surrogate model on the performance gap between its optima and the true optima, and to what extent the performance loss can be mitigated. Although prior work developed methods to improve the robustness of surrogate models and their associated optimization processes, a provably quantifiable relationship between an imperfect surrogate and the corresponding performance gap, as well as whether prior methods directly address it, remain elusive. To shed light on this important question, we present a theoretical framework to understand offline black-box optimization, by explicitly bounding the optimization quality based on how well the surrogate matches the latent gradient field that underlines the offline data. Inspired by our theoretical analysis, we propose a principled black-box gradient matching algorithm to create effective surrogate models for offline optimization, improving over prior approaches on various real-world benchmarks.