LG AI SY MLFeb 4, 2023

Reinforcement Learning with History-Dependent Dynamic Contexts

Guy Tennenholtz, Nadav Merlis, Lior Shani, Martin Mladenov, Craig Boutilier

arXiv:2302.02061v211.513 citationsh-index: 73

Originality Incremental advance

AI Analysis

This work addresses the challenge of handling evolving user behavior in recommendation systems, representing an incremental advance by extending contextual MDPs to history-dependent settings.

The authors tackled the problem of reinforcement learning in non-Markov environments with history-dependent contexts by introducing Dynamic Contextual Markov Decision Processes (DCMDPs), achieving regret bounds and demonstrating efficacy on a MovieLens recommendation task.

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions. This special structure allows us to derive an upper-confidence-bound style algorithm for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical model-based algorithm for logistic DCMDPs that plans in a latent space and uses optimism over history-dependent features. We demonstrate the efficacy of our approach on a recommendation task (using MovieLens data) where user behavior dynamics evolve in response to recommendations.

View on arXiv PDF

Similar