The Statistical Cost of Robust Kernel Hyperparameter Tuning
This addresses the statistical cost of hyperparameter tuning for researchers in kernel methods and robust regression, providing finite-sample guarantees that are incremental but specific to this setting.
The paper tackles the problem of kernel hyperparameter tuning in active regression with adversarial noise, showing that for common kernel classes like squared-exponential, hyperparameter optimization increases sample complexity by only a logarithmic factor compared to knowing optimal parameters in advance.
This paper studies the statistical complexity of kernel hyperparameter tuning in the setting of active regression under adversarial noise. We consider the problem of finding the best interpolant from a class of kernels with unknown hyperparameters, assuming only that the noise is square-integrable. We provide finite-sample guarantees for the problem, characterizing how increasing the complexity of the kernel class increases the complexity of learning kernel hyperparameters. For common kernel classes (e.g. squared-exponential kernels with unknown lengthscale), our results show that hyperparameter optimization increases sample complexity by just a logarithmic factor, in comparison to the setting where optimal parameters are known in advance. Our result is based on a subsampling guarantee for linear regression under multiple design matrices, combined with an ε-net argument for discretizing kernel parameterizations.