MLLGNov 27, 2019

Trading Convergence Rate with Computational Budget in High Dimensional Bayesian Optimization

arXiv:1911.11950v214 citations
Originality Incremental advance
AI Analysis

This work addresses the computational bottleneck in high-dimensional Bayesian optimization for researchers and practitioners, offering a trade-off between convergence rate and computational budget, though it is incremental relative to existing subspace-based approaches.

The paper tackles the challenge of scaling Bayesian optimization to high-dimensional spaces without structural assumptions by proposing a method that maximizes acquisition functions on low-dimensional subspaces, achieving a cumulative regret bound of O*(√(Tγ_T)) compared to O*(√(DTγ_T)) for prior methods, effectively reducing a factor of √D.

Scaling Bayesian optimisation (BO) to high-dimensional search spaces is a active and open research problems particularly when no assumptions are made on function structure. The main reason is that at each iteration, BO requires to find global maximisation of acquisition function, which itself is a non-convex optimization problem in the original search space. With growing dimensions, the computational budget for this maximisation gets increasingly short leading to inaccurate solution of the maximisation. This inaccuracy adversely affects both the convergence and the efficiency of BO. We propose a novel approach where the acquisition function only requires maximisation on a discrete set of low dimensional subspaces embedded in the original high-dimensional search space. Our method is free of any low dimensional structure assumption on the function unlike many recent high-dimensional BO methods. Optimising acquisition function in low dimensional subspaces allows our method to obtain accurate solutions within limited computational budget. We show that in spite of this convenience, our algorithm remains convergent. In particular, cumulative regret of our algorithm only grows sub-linearly with the number of iterations. More importantly, as evident from our regret bounds, our algorithm provides a way to trade the convergence rate with the number of subspaces used in the optimisation. Finally, when the number of subspaces is "sufficiently large", our algorithm's cumulative regret is at most $\mathcal{O}^{*}(\sqrt{Tγ_T})$ as opposed to $\mathcal{O}^{*}(\sqrt{DTγ_T})$ for the GP-UCB of Srinivas et al. (2012), reducing a crucial factor $\sqrt{D}$ where $D$ being the dimensional number of input space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes