Asymptotic convergence rates for averaging strategies
This work provides incremental theoretical foundations for optimization algorithms, benefiting researchers and practitioners in machine learning and optimization by offering formal guarantees for improved performance in parallel settings.
The paper extends theoretical convergence rates for averaging strategies in parallel black-box optimization from quadratic functions to a broader class of three times continuously differentiable functions with unique optima, proving that these strategies outperform pure random search asymptotically as the number of parallel evaluations increases.
Parallel black box optimization consists in estimating the optimum of a function using $λ$ parallel evaluations of $f$. Averaging the $μ$ best individuals among the $λ$ evaluations is known to provide better estimates of the optimum of a function than just picking up the best. In continuous domains, this averaging is typically just based on (possibly weighted) arithmetic means. Previous theoretical results were based on quadratic objective functions. In this paper, we extend the results to a wide class of functions, containing three times continuously differentiable functions with unique optimum. We prove formal rate of convergences and show they are indeed better than pure random search asymptotically in $λ$. We validate our theoretical findings with experiments on some standard black box functions.