ML LGOct 31, 2024

Minimum Empirical Divergence for Sub-Gaussian Linear Bandits

arXiv:2411.00229v27.53 citationsh-index: 2Has Code

Originality Incremental advance

AI Analysis

This work addresses efficient decision-making in linear bandits, offering a practical solution for off-policy evaluation, though it is incremental as it extends an existing method to a linear context.

The authors tackled the problem of linear bandits by proposing LinMED, a randomized algorithm with closed-form sampling probabilities, achieving near-optimal regret bounds of $d\sqrt{n}$ and $\frac{d^2}{\Delta}\log^2(n)\log(\log(n))$ for sub-Gaussian settings.

We propose a novel linear bandit algorithm called LinMED (Linear Minimum Empirical Divergence), which is a linear extension of the MED algorithm that was originally designed for multi-armed bandits. LinMED is a randomized algorithm that admits a closed-form computation of the arm sampling probabilities, unlike the popular randomized algorithm called linear Thompson sampling. Such a feature proves useful for off-policy evaluation where the unbiased evaluation requires accurately computing the sampling probability. We prove that LinMED enjoys a near-optimal regret bound of $d\sqrt{n}$ up to logarithmic factors where $d$ is the dimension and $n$ is the time horizon. We further show that LinMED enjoys a $\frac{d^2}Δ\left(\log^2(n)\right)\log\left(\log(n)\right)$ problem-dependent regret where $Δ$ is the smallest sub-optimality gap. Our empirical study shows that LinMED has a competitive performance with the state-of-the-art algorithms.

View on arXiv PDF Code

Similar