IRLGAug 16, 2019

Accelerated learning from recommender systems using multi-armed bandit

arXiv:1908.06158v14 citations
AI Analysis

This addresses the challenge for companies needing rapid iteration in recommender systems, but it is incremental as it builds on existing MAB methods applied to a specific testing bottleneck.

The paper tackles the problem of evaluating recommender system algorithms by proposing multi-armed bandit (MAB) testing as a solution to the long lead time and high cost of A/B testing, and presents experimental results comparing offline, MAB, and online A/B test metrics.

Recommendation systems are a vital component of many online marketplaces, where there are often millions of items to potentially present to users who have a wide variety of wants or needs. Evaluating recommender system algorithms is a hard task, given all the inherent bias in the data, and successful companies must be able to rapidly iterate on their solution to maintain their competitive advantage. The gold standard for evaluating recommendation algorithms has been the A/B test since it is an unbiased way to estimate how well one or more algorithms compare in the real world. However, there are a number of issues with A/B testing that make it impractical to be the sole method of testing, including long lead time, and high cost of exploration. We argue that multi armed bandit (MAB) testing as a solution to these issues. We showcase how we implemented a MAB solution as an extra step between offline and online A/B testing in a production system. We present the result of our experiment and compare all the offline, MAB, and online A/B tests metrics for our use case.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes