LGMLJan 23, 2019

Meta-Learning for Contextual Bandit Exploration

arXiv:1901.08159v120 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of efficient exploration in interactive decision-making for applications like recommendation systems, though it is incremental as it builds on existing meta-learning and bandit methods.

The paper tackles the exploration-exploitation trade-off in contextual bandits by proposing MELEE, a meta-learning algorithm that learns exploration policies from synthetic data and applies them to real tasks, outperforming seven baselines on 300 datasets, particularly when reward differences are large.

We describe MELEE, a meta-learning algorithm for learning a good exploration policy in the interactive contextual bandit setting. Here, an algorithm must take actions based on contexts, and learn based only on a reward signal from the action taken, thereby generating an exploration/exploitation trade-off. MELEE addresses this trade-off by learning a good exploration strategy for offline tasks based on synthetic data, on which it can simulate the contextual bandit setting. Based on these simulations, MELEE uses an imitation learning strategy to learn a good exploration policy that can then be applied to true contextual bandit tasks at test time. We compare MELEE to seven strong baseline contextual bandit algorithms on a set of three hundred real-world datasets, on which it outperforms alternatives in most settings, especially when differences in rewards are large. Finally, we demonstrate the importance of having a rich feature representation for learning how to explore.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes