LGSYMLSep 17, 2018

Hidden Markov Model Estimation-Based Q-learning for Partially Observable Markov Decision Process

arXiv:1809.06401v210 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of applying Q-learning in partially observable environments for reinforcement learning practitioners, though it appears incremental as it builds on existing HMM and Q-learning methods.

The paper tackles the problem of Q-learning performing poorly in partially observable Markov decision processes (POMDPs) by proposing an online Hidden Markov Model (HMM) estimation-based Q-learning algorithm, showing that the POMDP estimation converges to stationary points and the Q function converges to a fixed point satisfying the Bellman optimality equation.

The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM estimation problem and propose a recursive algorithm to estimate both the POMDP parameter and Q function concurrently. Also, we show that the POMDP estimation converges to a set of stationary points for the maximum likelihood estimate, and the Q function estimation converges to a fixed point that satisfies the Bellman optimality equation weighted on the invariant distribution of the state belief determined by the HMM estimation process.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes