LGOCMay 29, 2022

Stochastic Zeroth Order Gradient and Hessian Estimators: Variance Reduction and Refined Bias Bounds

arXiv:2205.14737v317 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses variance reduction in zeroth-order optimization methods, which is crucial for applications like black-box optimization where gradient information is unavailable, but it is incremental as it builds on existing finite-difference estimators.

The paper tackles the problem of high variance in stochastic zeroth-order gradient and Hessian estimators for functions in ℝⁿ by using random orthogonal directions, showing that with k directions and granularity δ, gradient variance is bounded by O((n/k - 1) + (n²/k - n)δ² + n²δ⁴/k) and Hessian variance by O((n²/k² - 1) + (n⁴/k² - n²)δ² + n⁴δ⁴/k²), with negligible variance when k=n, and provides improved bias bounds of O(δ²Γ).

We study stochastic zeroth order gradient and Hessian estimators for real-valued functions in $\mathbb{R}^n$. We show that, via taking finite difference along random orthogonal directions, the variance of the stochastic finite difference estimators can be significantly reduced. In particular, we design estimators for smooth functions such that, if one uses $ Θ\left( k \right) $ random directions sampled from the Stiefel's manifold $ \text{St} (n,k) $ and finite-difference granularity $δ$, the variance of the gradient estimator is bounded by $ \mathcal{O} \left( \left( \frac{n}{k} - 1 \right) + \left( \frac{n^2}{k} - n \right) δ^2 + \frac{ n^2 δ^4 }{ k } \right) $, and the variance of the Hessian estimator is bounded by $\mathcal{O} \left( \left( \frac{n^2}{k^2} - 1 \right) + \left( \frac{n^4}{k^2} - n^2 \right) δ^2 + \frac{n^4 δ^4 }{k^2} \right) $. When $k = n$, the variances become negligibly small. In addition, we provide improved bias bounds for the estimators. The bias of both gradient and Hessian estimators for smooth function $f$ is of order $\mathcal{O} \left( δ^2 Γ\right)$, where $δ$ is the finite-difference granularity, and $ Γ$ depends on high order derivatives of $f$. Our results are evidenced by empirical observations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes