ML LG STJun 29, 2023

Kernel $ε$-Greedy for Multi-Armed Bandits with Covariates

arXiv:2306.17329v22.3h-index: 34

Originality Incremental advance

AI Analysis

This work addresses the problem of optimizing decision-making in sequential settings with contextual information for applications like online advertising or recommendation systems, representing an incremental improvement by extending ε-greedy methods to kernel-based settings with theoretical guarantees.

The paper tackles the multi-armed bandit with covariates problem by proposing a kernel ε-greedy strategy that uses online weighted kernel ridge regression to estimate reward functions in an RKHS, achieving a sub-linear regret rate dependent on the RKHS dimensionality and an optimal √T rate under a margin condition for finite-dimensional RKHS.

We consider the $ε$-greedy strategy for the multi-arm bandit with covariates (MABC) problem, where the mean reward functions are assumed to lie in a reproducing kernel Hilbert space (RKHS). We propose to estimate the unknown mean reward functions using an online weighted kernel ridge regression estimator, and show the resultant estimator to be consistent under appropriate decay rates of the exploration probability sequence, $\{ε_t\}_t$, and regularization parameter, $\{λ_t\}_t$. Moreover, we show that for any choice of kernel and the corresponding RKHS, we achieve a sub-linear regret rate depending on the intrinsic dimensionality of the RKHS. Furthermore, we achieve the optimal regret rate of $\sqrt{T}$ under a margin condition for finite-dimensional RKHS.

View on arXiv PDF

Similar