LGMLSep 30, 2022

Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits

arXiv:2210.00025v411 citationsh-index: 33
Originality Incremental advance
AI Analysis

This addresses data inefficiency in bandit algorithms for applications like green security, though it is incremental as it builds on existing methods with a meta-algorithm approach.

The paper tackled the problem of efficiently incorporating historical data to warm-start bandit algorithms, which often suffer from spurious or imbalanced data, and showed that ArtificialReplay uses only a fraction of historical data while achieving identical regret for algorithms with a new property called IIData.

Most real-world deployments of bandit algorithms exist somewhere in between the offline and online set-up, where some historical data is available upfront and additional data is collected dynamically online. How best to incorporate historical data to "warm start" bandit algorithms is an open question: naively initializing reward estimates using all historical samples can suffer from spurious data and imbalanced data coverage, leading to data inefficiency (amount of historical data used) - particularly for continuous action spaces. To address these challenges, we propose ArtificialReplay, a meta-algorithm for incorporating historical data into any arbitrary base bandit algorithm. We show that ArtificialReplay uses only a fraction of the historical data compared to a full warm-start approach, while still achieving identical regret for base algorithms that satisfy independence of irrelevant data (IIData), a novel and broadly applicable property that we introduce. We complement these theoretical results with experiments on K-armed bandits and continuous combinatorial bandits, on which we model green security domains using real poaching data. Our results show the practical benefits of ArtificialReplay for improving data efficiency, including for base algorithms that do not satisfy IIData.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes