Yasong Feng

h-index11

4papers

46citations

Novelty70%

AI Score32

Ranked #124,088 of 194,257 authors (top 64%)#27,291 in LG (top 68%)

4 Papers

12.3LGFeb 3, 2023

A Lipschitz Bandits Approach for Continuous Hyperparameter Optimization

Yasong Feng, Weijian Luo, Yimin Huang et al.

One of the most critical problems in machine learning is HyperParameter Optimization (HPO), since choice of hyperparameters has a significant impact on final model performance. Although there are many HPO algorithms, they either have no theoretical guarantees or require strong assumptions. To this end, we introduce BLiE -- a Lipschitz-bandit-based algorithm for HPO that only assumes Lipschitz continuity of the objective function. BLiE exploits the landscape of the objective function to adaptively search over the hyperparameter space. Theoretically, we show that $(i)$ BLiE finds an $ε$-optimal hyperparameter with $\mathcal{O} \left( ε^{-(d_z + β)}\right)$ total budgets, where $d_z$ and $β$ are problem intrinsic; $(ii)$ BLiE is highly parallelizable. Empirically, we demonstrate that BLiE outperforms the state-of-the-art HPO algorithms on benchmark tasks. We also apply BLiE to search for noise schedule of diffusion models. Comparison with the default schedule shows that BLiE schedule greatly improves the sampling speed.

5.8LGMay 29, 2022Code

Stochastic Zeroth Order Gradient and Hessian Estimators: Variance Reduction and Refined Bias Bounds

Yasong Feng, Tianyu Wang

We study stochastic zeroth order gradient and Hessian estimators for real-valued functions in $\mathbb{R}^n$. We show that, via taking finite difference along random orthogonal directions, the variance of the stochastic finite difference estimators can be significantly reduced. In particular, we design estimators for smooth functions such that, if one uses $ Θ\left( k \right) $ random directions sampled from the Stiefel's manifold $ \text{St} (n,k) $ and finite-difference granularity $δ$, the variance of the gradient estimator is bounded by $ \mathcal{O} \left( \left( \frac{n}{k} - 1 \right) + \left( \frac{n^2}{k} - n \right) δ^2 + \frac{ n^2 δ^4 }{ k } \right) $, and the variance of the Hessian estimator is bounded by $\mathcal{O} \left( \left( \frac{n^2}{k^2} - 1 \right) + \left( \frac{n^4}{k^2} - n^2 \right) δ^2 + \frac{n^4 δ^4 }{k^2} \right) $. When $k = n$, the variances become negligibly small. In addition, we provide improved bias bounds for the estimators. The bias of both gradient and Hessian estimators for smooth function $f$ is of order $\mathcal{O} \left( δ^2 Γ\right)$, where $δ$ is the finite-difference granularity, and $ Γ$ depends on high order derivatives of $f$. Our results are evidenced by empirical observations.

3.8LGMay 19, 2023

From Random Search to Bandit Learning in Metric Measure Spaces

Chuying Han, Yasong Feng, Tianyu Wang

Random Search is one of the most widely-used method for Hyperparameter Optimization, and is critical to the success of deep learning models. Despite its astonishing performance, little non-heuristic theory has been developed to describe the underlying working mechanism. This paper gives a theoretical accounting of Random Search. We introduce the concept of \emph{scattering dimension} that describes the landscape of the underlying function, and quantifies the performance of random search. We show that, when the environment is noise-free, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s} } \right) $, where $ d_s \ge 0 $ is the scattering dimension of the underlying function. When the observed function values are corrupted by bounded $iid$ noise, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s + 1} } \right) $. In addition, based on the principles of random search, we introduce an algorithm, called BLiN-MOS, for Lipschitz bandits in doubling metric spaces that are also endowed with a probability measure, and show that under mild conditions, BLiN-MOS achieves a regret rate of order $ \widetilde{\mathcal{O}} \left( T^{ \frac{d_z}{d_z + 1} } \right) $, where $d_z$ is the zooming dimension of the problem instance.

13.6LGOct 19, 2021

Lipschitz Bandits with Batched Feedback

Yasong Feng, Zengfeng Huang, Tianyu Wang

In this paper, we study Lipschitz bandit problems with batched feedback, where the expected reward is Lipschitz and the reward observations are communicated to the player in batches. We introduce a novel landscape-aware algorithm, called Batched Lipschitz Narrowing (BLiN), that optimally solves this problem. Specifically, we show that for a $T$-step problem with Lipschitz reward of zooming dimension $d_z$, our algorithm achieves theoretically optimal (up to logarithmic factors) regret rate $\widetilde{\mathcal{O}}\left(T^{\frac{d_z+1}{d_z+2}}\right)$ using only $ \mathcal{O} \left( \log\log T\right) $ batches. We also provide complexity analysis for this problem. Our theoretical lower bound implies that $Ω(\log\log T)$ batches are necessary for any algorithm to achieve the optimal regret. Thus, BLiN achieves optimal regret rate (up to logarithmic factors) using minimal communication.