LG MLOct 6, 2018

Learning to Optimize under Non-Stationarity

Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

arXiv:1810.03024v625.0168 citations

Originality Highly original

AI Analysis

It addresses dynamic optimization challenges in applications like pricing and ads allocation for practitioners in changing environments, representing a strong specific gain.

The paper tackles the problem of optimizing under non-stationarity in linear stochastic bandits, achieving state-of-the-art dynamic regret bounds with algorithms like SW-UCB and BOB, such as optimal O~(d^{2/3}(B_T+1)^{1/3}T^{2/3}) and best O~(d^{2/3}(B_T+1)^{1/4}T^{3/4}) bounds.

We introduce algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Defining $d,B_T,$ and $T$ as the problem dimension, the \emph{variation budget}, and the total time horizon, respectively, our main contributions are the tuned Sliding Window UCB (\texttt{SW-UCB}) algorithm with optimal $\widetilde{O}(d^{2/3}(B_T+1)^{1/3}T^{2/3})$ dynamic regret, and the tuning free bandit-over-bandit (\texttt{BOB}) framework built on top of the \texttt{SW-UCB} algorithm with best $\widetilde{O}(d^{2/3}(B_T+1)^{1/4}T^{3/4})$ dynamic regret.

View on arXiv PDF

Similar