Gradient-free stochastic optimization for additive models
This addresses optimization problems where gradient information is unavailable, but the result is incremental as it extends known frameworks without substantial improvement.
The paper tackles zero-order optimization for additive models under smoothness conditions, showing that the minimax optimal error is dT^{-(β-1)/β}, and concludes that additive models do not provide significant accuracy gains in this context compared to nonparametric estimation.
We address the problem of zero-order optimization from noisy observations for an objective function satisfying the Polyak-Łojasiewicz or the strong convexity condition. Additionally, we assume that the objective function has an additive structure and satisfies a higher-order smoothness property, characterized by the Hölder family of functions. The additive model for Hölder classes of functions is well-studied in the literature on nonparametric function estimation, where it is shown that such a model benefits from a substantial improvement of the estimation accuracy compared to the Hölder model without additive structure. We study this established framework in the context of gradient-free optimization. We propose a randomized gradient estimator that, when plugged into a gradient descent algorithm, allows one to achieve minimax optimal optimization error of the order $dT^{-(β-1)/β}$, where $d$ is the dimension of the problem, $T$ is the number of queries and $β\ge 2$ is the Hölder degree of smoothness. We conclude that, in contrast to nonparametric estimation problems, no substantial gain of accuracy can be achieved when using additive models in gradient-free optimization.