Kolmogorov-Smirnov Test-Based Actively-Adaptive Thompson Sampling for Non-Stationary Bandits
This addresses the challenge of adapting to changing environments in sequential decision-making problems, such as task-offloading in wireless edge-computing and portfolio optimization, with incremental improvements in detection capabilities.
The paper tackles the problem of non-stationary multi-armed bandits by proposing a Kolmogorov-Smirnov test-based Thompson Sampling algorithm (TS-KS) that actively detects change points and resets parameters, showing it has sub-linear regret and outperforms other non-stationary bandit algorithms and performs comparably to state-of-the-art forecasting methods like Facebook-PROPHET and ARIMA.
We consider the non-stationary multi-armed bandit (MAB) framework and propose a Kolmogorov-Smirnov (KS) test based Thompson Sampling (TS) algorithm named TS-KS, that actively detects change points and resets the TS parameters once a change is detected. In particular, for the two-armed bandit case, we derive bounds on the number of samples of the reward distribution to detect the change once it occurs. Consequently, we show that the proposed algorithm has sub-linear regret. Contrary to existing works, our algorithm is able to detect a change when the underlying reward distribution changes even though the mean reward remains the same. Finally, to test the efficacy of the proposed algorithm, we employ it in two case-studies: i) task-offloading scenario in wireless edge-computing, and ii) portfolio optimization. Our results show that the proposed TS-KS algorithm outperforms not only the static TS algorithm but also it performs better than other bandit algorithms designed for non-stationary environments. Moreover, the performance of TS-KS is at par with the state-of-the-art forecasting algorithms such as Facebook-PROPHET and ARIMA.