LGMLNov 11, 2018

Adapting multi-armed bandits policies to contextual bandits scenarios

arXiv:1811.04383v236 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses a specific challenge in reinforcement learning for researchers, offering incremental improvements in scalability and flexibility for contextual bandits.

The paper tackled the problem of adapting multi-armed bandits policies to online contextual bandits with binary rewards, using classification algorithms as black-box oracles, and found that the Adaptive-Greedy algorithm often outperformed upper confidence bound and Thompson sampling strategies, though with more hyperparameters to tune.

This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression as black-box oracles. Some of these adaptations are achieved through bootstrapping or approximate bootstrapping, while others rely on other forms of randomness, resulting in more scalable approaches than previous works, and the ability to work with any type of classification algorithm. In particular, the Adaptive-Greedy algorithm shows a lot of promise, in many cases achieving better performance than upper confidence bound and Thompson sampling strategies, at the expense of more hyperparameters to tune.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes