LGMar 12, 2023

Energy Regularized RNNs for Solving Non-Stationary Bandit Problems

arXiv:2303.06552v21 citationsh-index: 38Has Code
AI Analysis

This work addresses non-stationary bandit problems for reinforcement learning applications, representing an incremental improvement with a novel regularization approach.

The paper tackles non-stationary bandit problems with reward dependencies on past actions and contexts by using a recurrent neural network with an energy regularization term to balance exploration and exploitation, demonstrating effectiveness comparable to existing methods on benchmark problems.

We consider a Multi-Armed Bandit problem in which the rewards are non-stationary and are dependent on past actions and potentially on past contexts. At the heart of our method, we employ a recurrent neural network, which models these sequences. In order to balance between exploration and exploitation, we present an energy minimization term that prevents the neural network from becoming too confident in support of a certain action. This term provably limits the gap between the maximal and minimal probabilities assigned by the network. In a diverse set of experiments, we demonstrate that our method is at least as effective as methods suggested to solve the sub-problem of Rotting Bandits, and can solve intuitive extensions of various benchmark problems. We share our implementation at https://github.com/rotmanmi/Energy-Regularized-RNN.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes