Understanding High-Dimensional Bayesian Optimization
This addresses the practical challenge of optimizing high-dimensional functions for researchers and practitioners, though it is incremental in refining existing methods.
The paper investigates why simple Bayesian optimization methods succeed in high-dimensional real-world tasks despite prior expectations, finding that vanishing gradients from Gaussian process initialization cause failures and that maximum likelihood estimation of length scales achieves state-of-the-art performance.
Recent work reported that simple Bayesian optimization (BO) methods perform well for high-dimensional real-world tasks, seemingly contradicting prior work and tribal knowledge. This paper investigates why. We identify underlying challenges that arise in high-dimensional BO and explain why recent methods succeed. Our empirical analysis shows that vanishing gradients caused by Gaussian process (GP) initialization schemes play a major role in the failures of high-dimensional Bayesian optimization (HDBO) and that methods that promote local search behaviors are better suited for the task. We find that maximum likelihood estimation (MLE) of GP length scales suffices for state-of-the-art performance. Based on this, we propose a simple variant of MLE called MSR that leverages these findings to achieve state-of-the-art performance on a comprehensive set of real-world applications. We present targeted experiments to illustrate and confirm our findings.