MLLGJun 11, 2022

Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits

arXiv:2206.05404v36 citationsh-index: 66
Originality Highly original
AI Analysis

This addresses the efficiency of decision-making in sequential settings with linear contexts, offering a near-optimal solution with theoretical guarantees.

The paper tackles the problem of linear contextual bandits by proposing an algorithm with an $O(\sqrt{dT\log T})$ regret bound, matching a new lower bound of $\Omega(\sqrt{dT})$ up to logarithmic factors, and outperforms existing methods in experiments.

We propose a linear contextual bandit algorithm with $O(\sqrt{dT\log T})$ regret bound, where $d$ is the dimension of contexts and $T$ isthe time horizon. Our proposed algorithm is equipped with a novel estimator in which exploration is embedded through explicit randomization. Depending on the randomization, our proposed estimator takes contributions either from contexts of all arms or from selected contexts. We establish a self-normalized bound for our estimator, which allows a novel decomposition of the cumulative regret into \textit{additive} dimension-dependent terms instead of multiplicative terms. We also prove a novel lower bound of $Ω(\sqrt{dT})$ under our problem setting. Hence, the regret of our proposed algorithm matches the lower bound up to logarithmic factors. The numerical experiments support the theoretical guarantees and show that our proposed method outperforms the existing linear bandit algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes