MLLGSTOct 28, 2021

Open Problem: Tight Online Confidence Intervals for RKHS Elements

arXiv:2110.15458v124 citations
Originality Synthesis-oriented
AI Analysis

This addresses a critical issue for researchers and practitioners in online learning, as it is an incremental step focusing on improving theoretical guarantees for kernelized algorithms.

The paper tackles the problem of suboptimal regret bounds in kernel-based bandit and reinforcement learning algorithms due to non-tight online confidence intervals for RKHS elements, highlighting that existing bounds may not even be sublinear.

Confidence intervals are a crucial building block in the analysis of various online learning problems. The analysis of kernel based bandit and reinforcement learning problems utilize confidence intervals applicable to the elements of a reproducing kernel Hilbert space (RKHS). However, the existing confidence bounds do not appear to be tight, resulting in suboptimal regret bounds. In fact, the existing regret bounds for several kernelized bandit algorithms (e.g., GP-UCB, GP-TS, and their variants) may fail to even be sublinear. It is unclear whether the suboptimal regret bound is a fundamental shortcoming of these algorithms or an artifact of the proof, and the main challenge seems to stem from the online (sequential) nature of the observation points. We formalize the question of online confidence intervals in the RKHS setting and overview the existing results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes