MLLGJun 2, 2024

Lasso Bandit with Compatibility Condition on Optimal Arm

arXiv:2406.00823v24 citations
Originality Incremental advance
AI Analysis

This work provides a more efficient solution for high-dimensional bandit problems with sparse rewards, benefiting applications like online advertising and recommendation systems, though it is incremental in refining theoretical assumptions.

The paper tackles the problem of achieving logarithmic regret in stochastic sparse linear bandits by showing that a compatibility condition on the optimal arm alone, without additional diversity assumptions, is sufficient to derive an O(poly log dT) regret bound, and proposes an algorithm that outperforms existing methods under weaker assumptions.

We consider a stochastic sparse linear bandit problem where only a sparse subset of context features affects the expected reward function, i.e., the unknown reward parameter has a sparse structure. In the existing Lasso bandit literature, the compatibility conditions, together with additional diversity conditions on the context features are imposed to achieve regret bounds that only depend logarithmically on the ambient dimension $d$. In this paper, we demonstrate that even without the additional diversity assumptions, the \textit{compatibility condition on the optimal arm} is sufficient to derive a regret bound that depends logarithmically on $d$, and our assumption is strictly weaker than those used in the lasso bandit literature under the single-parameter setting. We propose an algorithm that adapts the forced-sampling technique and prove that the proposed algorithm achieves $O(\text{poly}\log dT)$ regret under the margin condition. To our knowledge, the proposed algorithm requires the weakest assumptions among Lasso bandit algorithms under the single-parameter setting that achieve $O(\text{poly}\log dT)$ regret. Through numerical experiments, we confirm the superior performance of our proposed algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes