LGDSOct 28, 2022

Non-Stationary Bandits with Auto-Regressive Temporal Dependency

arXiv:2210.16386v318 citationsh-index: 25
Originality Highly original
AI Analysis

This addresses the challenge of temporal dynamics in real-world applications like recommendation systems and online advertising, representing a novel method for a known bottleneck.

The paper tackles the problem of non-stationary multi-armed bandits with temporal dependencies by introducing a novel framework with an auto-regressive reward structure and an algorithm that balances exploration-exploitation and discards outdated information, achieving a regret upper bound nearly matching the lower bound and demonstrating efficacy in a tourism demand prediction case study.

Traditional multi-armed bandit (MAB) frameworks, predominantly examined under stochastic or adversarial settings, often overlook the temporal dynamics inherent in many real-world applications such as recommendation systems and online advertising. This paper introduces a novel non-stationary MAB framework that captures the temporal structure of these real-world dynamics through an auto-regressive (AR) reward structure. We propose an algorithm that integrates two key mechanisms: (i) an alternation mechanism adept at leveraging temporal dependencies to dynamically balance exploration and exploitation, and (ii) a restarting mechanism designed to discard out-of-date information. Our algorithm achieves a regret upper bound that nearly matches the lower bound, with regret measured against a robust dynamic benchmark. Finally, via a real-world case study on tourism demand prediction, we demonstrate both the efficacy of our algorithm and the broader applicability of our techniques to more complex, rapidly evolving time series.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes