LGMLMar 19, 2024

Transfer in Sequential Multi-armed Bandits via Reward Samples

arXiv:2403.12428v13 citationsECC
Originality Incremental advance
AI Analysis

This work addresses the challenge of adapting to non-stationary environments in bandit problems, which is incremental as it builds on existing UCB methods by adding transfer learning.

The paper tackles the problem of sequential multi-armed bandits with changing reward distributions across episodes by proposing a UCB-based algorithm that transfers reward samples from previous episodes, resulting in significant improvement in cumulative regret performance compared to standard UCB without transfer.

We consider a sequential stochastic multi-armed bandit problem where the agent interacts with bandit over multiple episodes. The reward distribution of the arms remain constant throughout an episode but can change over different episodes. We propose an algorithm based on UCB to transfer the reward samples from the previous episodes and improve the cumulative regret performance over all the episodes. We provide regret analysis and empirical results for our algorithm, which show significant improvement over the standard UCB algorithm without transfer.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes