LGAIMSOCMLJun 16, 2022

Scalable First-Order Bayesian Optimization via Structured Automatic Differentiation

arXiv:2206.08366v113 citationsh-index: 12
AI Analysis

This work addresses the computational bottleneck for researchers and practitioners using BO in high-dimensional optimization problems, offering a scalable solution with broad applicability.

The paper tackles the scalability issue of Bayesian Optimization (BO) in high dimensions by exploiting structured matrices in gradient-based Gaussian process surrogates, achieving exact matrix-vector multiplication in O(n^2d) operations for gradient observations, which enables flexible modeling and scaling to high dimensions.

Bayesian Optimization (BO) has shown great promise for the global optimization of functions that are expensive to evaluate, but despite many successes, standard approaches can struggle in high dimensions. To improve the performance of BO, prior work suggested incorporating gradient information into a Gaussian process surrogate of the objective, giving rise to kernel matrices of size $nd \times nd$ for $n$ observations in $d$ dimensions. Naïvely multiplying with (resp. inverting) these matrices requires $\mathcal{O}(n^2d^2)$ (resp. $\mathcal{O}(n^3d^3$)) operations, which becomes infeasible for moderate dimensions and sample sizes. Here, we observe that a wide range of kernels gives rise to structured matrices, enabling an exact $\mathcal{O}(n^2d)$ matrix-vector multiply for gradient observations and $\mathcal{O}(n^2d^2)$ for Hessian observations. Beyond canonical kernel classes, we derive a programmatic approach to leveraging this type of structure for transformations and combinations of the discussed kernel classes, which constitutes a structure-aware automatic differentiation algorithm. Our methods apply to virtually all canonical kernels and automatically extend to complex kernels, like the neural network, radial basis function network, and spectral mixture kernels without any additional derivations, enabling flexible, problem-dependent modeling while scaling first-order BO to high $d$.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes