MLApr 19, 2022
Compressed Empirical Measures (in finite dimensions)Steffen Grünewälder
We study approaches for compressing the empirical measure in the context of finite dimensional reproducing kernel Hilbert spaces (RKHSs). In this context, the empirical measure is contained within a natural convex set and can be approximated using convex optimization methods. Such an approximation gives rise to a coreset of data points. A key quantity that controls how large such a coreset has to be is the size of the largest ball around the empirical measure that is contained within the empirical convex set. The bulk of our work is concerned with deriving high probability lower bounds on the size of such a ball under various conditions and in various settings: we show how conditions on the density of the data and the kernel function can be used to infer such lower bounds; we further develop an approach that uses a lower bound on the smallest eigenvalue of a covariance operator to provide lower bounds on the size of such a ball; we extend the approach to approximate covariance operators and we show how it can be used in the context of kernel ridge regression. We also derive compression guarantees when standard algorithms like the conditional gradient method are used and we discuss variations of such algorithms to improve the runtime of these standard algorithms. We conclude with a construction of an infinite dimensional RKHS for which the compression is poor, highlighting some of the difficulties one faces when trying to move to infinite dimensional RKHSs.
MLMar 15, 2025
Support Collapse of Deep Gaussian Processes with Polynomial Kernels for a Wide Regime of HyperparametersDaryna Chernobrovkina, Steffen Grünewälder
We analyze the prior that a Deep Gaussian Process with polynomial kernels induces. We observe that, even for relatively small depths, averaging effects occur within such a Deep Gaussian Process and that the prior can be analyzed and approximated effectively by means of the Berry-Esseen Theorem. One of the key findings of this analysis is that, in the absence of careful hyper-parameter tuning, the prior of a Deep Gaussian Process either collapses rapidly towards zero as the depth increases or places negligible mass on low norm functions. This aligns well with experimental findings and mirrors known results for convolution based Deep Gaussian Processes.
MLFeb 7, 2020
Oblivious Data for Fairness with KernelsSteffen Grünewälder, Azadeh Khaleghi
We investigate the problem of algorithmic fairness in the case where sensitive and non-sensitive features are available and one aims to generate new, `oblivious', features that closely approximate the non-sensitive features, and are only minimally dependent on the sensitive ones. We study this question in the context of kernel methods. We analyze a relaxed version of the Maximum Mean Discrepancy criterion which does not guarantee full independence but makes the optimization problem tractable. We derive a closed-form solution for this relaxed optimization problem and complement the result with a study of the dependencies between the newly generated features and the sensitive ones. Our key ingredient for generating such oblivious features is a Hilbert-space-valued conditional expectation, which needs to be estimated from data. We propose a plug-in approach and demonstrate how the estimation errors can be controlled. While our techniques help reduce the bias, we would like to point out that no post-processing of any dataset could possibly serve as an alternative to well-designed experiments.
MLOct 31, 2019
Recovering BanditsCiara Pike-Burke, Steffen Grünewälder
We study the recovering bandits problem, a variant of the stochastic multi-armed bandit problem where the expected reward of each arm varies according to some unknown function of the time since the arm was last played. While being a natural extension of the classical bandit problem that arises in many real-world settings, this variation is accompanied by significant difficulties. In particular, methods need to plan ahead and estimate many more quantities than in the classical bandit setting. In this work, we explore the use of Gaussian processes to tackle the estimation and planing problem. We also discuss different regret definitions that let us quantify the performance of the methods. To improve computational efficiency of the methods, we provide an optimistic planning approximation. We complement these discussions with regret bounds and empirical studies.
LGMay 21, 2012
Conditional mean embeddings as regressors - supplementarySteffen Grünewälder, Guy Lever, Luca Baldassarre et al.
We demonstrate an equivalence between reproducing kernel Hilbert space (RKHS) embeddings of conditional distributions and vector-valued regressors. This connection introduces a natural regularized loss function which the RKHS embeddings minimise, providing an intuitive understanding of the embeddings and a justification for their use. Furthermore, the equivalence allows the application of vector-valued regression methods and results to the problem of learning conditional distributions. Using this link we derive a sparse version of the embedding by considering alternative formulations. Further, by applying convergence results for vector-valued regression to the embedding problem we derive minimax convergence rates which are O(\log(n)/n) -- compared to current state of the art rates of O(n^{-1/4}) -- and are valid under milder and more intuitive assumptions. These minimax upper rates coincide with lower rates up to a logarithmic factor, showing that the embedding method achieves nearly optimal rates. We study our sparse embedding algorithm in a reinforcement learning task where the algorithm shows significant improvement in sparsity over an incomplete Cholesky decomposition.