LGAIMLSep 7, 2019

AutoML for Contextual Bandits

arXiv:1909.03212v27 citationsHas Code
AI Analysis

This provides an incremental improvement for practitioners in areas like personalization and recommendation systems by simplifying deployment through meta-learning.

The paper tackles the problem of automating the optimization of Q-functions in contextual bandits to improve efficiency over methods like A/B testing, resulting in a model that outperforms or matches prior work with no tuning or feature engineering, converging with limited samples.

Contextual Bandits is one of the widely popular techniques used in applications such as personalization, recommendation systems, mobile health, causal marketing etc . As a dynamic approach, it can be more efficient than standard A/B testing in minimizing regret. We propose an end to end automated meta-learning pipeline to approximate the optimal Q function for contextual bandits problems. We see that our model is able to perform much better than random exploration, being more regret efficient and able to converge with a limited number of samples, while remaining very general and easy to use due to the meta-learning approach. We used a linearly annealed e-greedy exploration policy to define the exploration vs exploitation schedule. We tested the system on a synthetic environment to characterize it fully and we evaluated it on some open source datasets to benchmark against prior work. We see that our model outperforms or performs comparatively to other models while requiring no tuning nor feature engineering.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes