LGOct 15, 2021

Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits

arXiv:2110.08057v312 citations
AI Analysis

This solves the batch-regret tradeoff problem for researchers and practitioners in bandit algorithms, offering a more practical and broadly applicable solution compared to incremental prior work.

The paper tackles the optimal batch-regret tradeoff in batch linear contextual bandits, establishing an algorithm with a two-phase regret guarantee that matches a lower bound across all parameter ranges, achieving optimality up to logarithmic factors. It improves on prior work by being simpler, practical, and effective for all T ≥ d, unlike previous methods requiring unrealistically large T.

We study the optimal batch-regret tradeoff for batch linear contextual bandits. For any batch number $M$, number of actions $K$, time horizon $T$, and dimension $d$, we provide an algorithm and prove its regret guarantee, which, due to technical reasons, features a two-phase expression as the time horizon $T$ grows. We also prove a lower bound theorem that surprisingly shows the optimality of our two-phase regret upper bound (up to logarithmic factors) in the \emph{full range} of the problem parameters, therefore establishing the exact batch-regret tradeoff. Compared to the recent work \citep{ruan2020linear} which showed that $M = O(\log \log T)$ batches suffice to achieve the asymptotically minimax-optimal regret without the batch constraints, our algorithm is simpler and easier for practical implementation. Furthermore, our algorithm achieves the optimal regret for all $T \geq d$, while \citep{ruan2020linear} requires that $T$ greater than an unrealistically large polynomial of $d$. Along our analysis, we also prove a new matrix concentration inequality with dependence on their dynamic upper bounds, which, to the best of our knowledge, is the first of its kind in literature and maybe of independent interest.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes