LGAIFeb 14, 2022

Reinforcement Learning in Presence of Discrete Markovian Context Evolution

arXiv:2202.06557v112 citations
Originality Incremental advance
AI Analysis

This addresses a challenging RL setting common in applications, offering a novel solution for handling unknown and changing contexts, though it appears incremental in combining existing Bayesian and RL techniques.

The paper tackles the problem of reinforcement learning with discrete, unobservable Markovian contexts that change abruptly during episodes, using a Bayesian approach with a sticky HDP prior and context distillation to infer the number of contexts and enable efficient policy learning. It demonstrates empirical success in gym environments where state-of-the-art methods fail.

We consider a context-dependent Reinforcement Learning (RL) setting, which is characterized by: a) an unknown finite number of not directly observable contexts; b) abrupt (discontinuous) context changes occurring during an episode; and c) Markovian context evolution. We argue that this challenging case is often met in applications and we tackle it using a Bayesian approach and variational inference. We adapt a sticky Hierarchical Dirichlet Process (HDP) prior for model learning, which is arguably best-suited for Markov process modeling. We then derive a context distillation procedure, which identifies and removes spurious contexts in an unsupervised fashion. We argue that the combination of these two components allows to infer the number of contexts from data thus dealing with the context cardinality assumption. We then find the representation of the optimal policy enabling efficient policy learning using off-the-shelf RL algorithms. Finally, we demonstrate empirically (using gym environments cart-pole swing-up, drone, intersection) that our approach succeeds where state-of-the-art methods of other frameworks fail and elaborate on the reasons for such failures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes