OCLGSep 20, 2022

Lower Bounds on the Worst-Case Complexity of Efficient Global Optimization

arXiv:2209.09655v19 citationsh-index: 20
Originality Highly original
AI Analysis

This provides theoretical insights into the inherent hardness of a widely used optimization method, which is incremental but important for algorithm design and analysis in fields like hyperparameter tuning and material design.

The paper tackles the problem of understanding the fundamental limits of efficient global optimization algorithms for expensive black-box functions, deriving a unified lower bound on worst-case complexity in terms of metric entropy of RKHS balls and showing it is nearly optimal for common kernels like squared exponential and Matérn.

Efficient global optimization is a widely used method for optimizing expensive black-box functions such as tuning hyperparameter, and designing new material, etc. Despite its popularity, less attention has been paid to analyzing the inherent hardness of the problem although, given its extensive use, it is important to understand the fundamental limits of efficient global optimization algorithms. In this paper, we study the worst-case complexity of the efficient global optimization problem and, in contrast to existing kernel-specific results, we derive a unified lower bound for the complexity of efficient global optimization in terms of the metric entropy of a ball in its corresponding reproducing kernel Hilbert space~(RKHS). Specifically, we show that if there exists a deterministic algorithm that achieves suboptimality gap smaller than $ε$ for any function $f\in S$ in $T$ function evaluations, it is necessary that $T$ is at least $Ω\left(\frac{\log\mathcal{N}(S(\mathcal{X}), 4ε,\|\cdot\|_\infty)}{\log(\frac{R}ε)}\right)$, where $\mathcal{N}(\cdot,\cdot,\cdot)$ is the covering number, $S$ is the ball centered at $0$ with radius $R$ in the RKHS and $S(\mathcal{X})$ is the restriction of $S$ over the feasible set $\mathcal{X}$. Moreover, we show that this lower bound nearly matches the upper bound attained by non-adaptive search algorithms for the commonly used squared exponential kernel and the Matérn kernel with a large smoothness parameter $ν$, up to a replacement of $d/2$ by $d$ and a logarithmic term $\log\frac{R}ε$. That is to say, our lower bound is nearly optimal for these kernels.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes