LG SY MLNov 6, 2022

On learning history based policies for controlling Markov decision processes

Gandharv Patil, Aditya Mahajan, Doina Precup

arXiv:2211.03011v18.77 citationsh-index: 65

Originality Synthesis-oriented

AI Analysis

This work addresses a theoretical gap for researchers in reinforcement learning, though it appears incremental as it builds on existing folklore without claiming major breakthroughs.

The paper tackles the lack of formal analysis for history-based reinforcement learning methods by introducing a theoretical framework to study their behavior in controlling Markov decision processes, and it designs a practical algorithm that is numerically evaluated on continuous control tasks.

Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods,suchas recurrent neural nets or history-based state abstraction, perform better than their memory-less counterparts, due to the fact that function approximation in Markov decision processes (MDP) can be viewed as inducing a Partially observable MDP. However, there has been little formal analysis of such history-based algorithms, as most existing frameworks focus exclusively on memory-less features. In this paper, we introduce a theoretical framework for studying the behaviour of RL algorithms that learn to control an MDP using history-based feature abstraction mappings. Furthermore, we use this framework to design a practical RL algorithm and we numerically evaluate its effectiveness on a set of continuous control tasks.

View on arXiv PDF

Similar