Open Problem: Tight Bounds for Kernelized Multi-Armed Bandits with Bernoulli Rewards
This is an incremental work that highlights a specific theoretical gap in online learning for researchers in kernelized bandits.
The paper identifies an open problem in kernelized bandits: establishing tight bounds for the Bernoulli reward model, where observations are binary draws from a Bernoulli distribution with parameter f(x_t), contrasting with the commonly studied subgaussian noise model.
We consider Kernelized Bandits (KBs) to optimize a function $f : \mathcal{X} \rightarrow [0,1]$ belonging to the Reproducing Kernel Hilbert Space (RKHS) $\mathcal{H}_k$. Mainstream works on kernelized bandits focus on a subgaussian noise model in which observations of the form $f(\mathbf{x}_t)+ε_t$, being $ε_t$ a subgaussian noise, are available (Chowdhury and Gopalan, 2017). Differently, we focus on the case in which we observe realizations $y_t \sim \text{Ber}(f(\mathbf{x}_t))$ sampled from a Bernoulli distribution with parameter $f(\mathbf{x}_t)$. While the Bernoulli model has been investigated successfully in multi-armed bandits (Garivier and Cappé, 2011), logistic bandits (Faury et al., 2022), bandits in metric spaces (Magureanu et al., 2014), it remains an open question whether tight results can be obtained for KBs. This paper aims to draw the attention of the online learning community to this open problem.