LGMLMar 16, 2023

On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits

arXiv:2303.09390v17 citationsh-index: 15
Originality Highly original
AI Analysis

This work addresses the challenge of robust decision-making in bandit algorithms for scenarios with approximate linear models, providing theoretical guarantees for practitioners in fields like recommendation systems.

The paper tackles the problem of linear contextual bandits with model misspecification, showing that efficient learning is possible when the misspecification level is bounded relative to the sub-optimality gap, achieving a regret bound of ̃O(d^2/Δ) under certain conditions.

We study linear contextual bandits in the misspecified setting, where the expected reward function can be approximated by a linear function class up to a bounded misspecification level $ζ>0$. We propose an algorithm based on a novel data selection scheme, which only selects the contextual vectors with large uncertainty for online regression. We show that, when the misspecification level $ζ$ is dominated by $\tilde O (Δ/ \sqrt{d})$ with $Δ$ being the minimal sub-optimality gap and $d$ being the dimension of the contextual vectors, our algorithm enjoys the same gap-dependent regret bound $\tilde O (d^2/Δ)$ as in the well-specified setting up to logarithmic factors. In addition, we show that an existing algorithm SupLinUCB (Chu et al., 2011) can also achieve a gap-dependent constant regret bound without the knowledge of sub-optimality gap $Δ$. Together with a lower bound adapted from Lattimore et al. (2020), our result suggests an interplay between misspecification level and the sub-optimality gap: (1) the linear contextual bandit model is efficiently learnable when $ζ\leq \tilde O(Δ/ \sqrt{d})$; and (2) it is not efficiently learnable when $ζ\geq \tilde Ω(Δ / {\sqrt{d}})$. Experiments on both synthetic and real-world datasets corroborate our theoretical results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes