LGSYMLNov 6, 2022

On learning history based policies for controlling Markov decision processes

arXiv:2211.03011v17 citationsh-index: 65
Originality Synthesis-oriented
AI Analysis

This work addresses a theoretical gap for researchers in reinforcement learning, though it appears incremental as it builds on existing folklore without claiming major breakthroughs.

The paper tackles the lack of formal analysis for history-based reinforcement learning methods by introducing a theoretical framework to study their behavior in controlling Markov decision processes, and it designs a practical algorithm that is numerically evaluated on continuous control tasks.

Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods,suchas recurrent neural nets or history-based state abstraction, perform better than their memory-less counterparts, due to the fact that function approximation in Markov decision processes (MDP) can be viewed as inducing a Partially observable MDP. However, there has been little formal analysis of such history-based algorithms, as most existing frameworks focus exclusively on memory-less features. In this paper, we introduce a theoretical framework for studying the behaviour of RL algorithms that learn to control an MDP using history-based feature abstraction mappings. Furthermore, we use this framework to design a practical RL algorithm and we numerically evaluate its effectiveness on a set of continuous control tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes