SY SYJul 3, 2017

On Asymptotic Properties of Hyperparameter Estimators for Kernel-based Regularization Methods

arXiv:1707.0040780 citations

AI Analysis

For practitioners using kernel methods, this clarifies which hyperparameter estimators are asymptotically optimal and their convergence rates, though the results are theoretical and incremental.

This paper analyzes the asymptotic properties of hyperparameter estimators for kernel-based regularization, showing that two Stein's unbiased risk estimators (SURE) are asymptotically optimal (converging to the best hyperparameter minimizing mean square error), while the empirical Bayes (EB) estimator is not. However, the EB estimator converges faster and is independent of the convergence rate of the regression matrix.

The kernel-based regularization method has two core issues: kernel design and hyperparameter estimation. In this paper, we focus on the second issue and study the properties of several hyperparameter estimators including the empirical Bayes (EB) estimator, two Stein's unbiased risk estimators (SURE) and their corresponding Oracle counterparts, with an emphasis on the asymptotic properties of these hyperparameter estimators. To this goal, we first derive and then rewrite the first order optimality conditions of these hyperparameter estimators, leading to several insights on these hyperparameter estimators. Then we show that as the number of data goes to infinity, the two SUREs converge to the best hyperparameter minimizing the corresponding mean square error, respectively, while the more widely used EB estimator converges to another best hyperparameter minimizing the expectation of the EB estimation criterion. This indicates that the two SUREs are asymptotically optimal but the EB estimator is not. Surprisingly, the convergence rate of two SUREs is slower than that of the EB estimator, and moreover, unlike the two SUREs, the EB estimator is independent of the convergence rate of $Φ^TΦ/N$ to its limit, where $Φ$ is the regression matrix and $N$ is the number of data. A Monte Carlo simulation is provided to demonstrate the theoretical results.

View on arXiv PDF

Similar