LGSYOCMLOct 2, 2020

POMDPs in Continuous Time and Discrete Spaces

arXiv:2010.01014v310 citations
Originality Incremental advance
AI Analysis

This work addresses a gap in mathematical modeling for decision-making in discrete event systems and population dynamics, though it is incremental as it builds on existing filtering and control theories.

The authors tackled the problem of optimal decision-making in partially observable systems with discrete states and actions evolving in continuous time, which lacked a mathematical description. They derived a Hamilton-Jacobi-Bellman equation for continuous-time POMDPs and solved it using deep learning, demonstrating applicability on toy examples.

Many processes, such as discrete event systems in engineering or population dynamics in biology, evolve in discrete space and continuous time. We consider the problem of optimal decision making in such discrete state and action space systems under partial observability. This places our work at the intersection of optimal filtering and optimal control. At the current state of research, a mathematical description for simultaneous decision making and filtering in continuous time with finite state and action spaces is still missing. In this paper, we give a mathematical description of a continuous-time partial observable Markov decision process (POMDP). By leveraging optimal filtering theory we derive a Hamilton-Jacobi-Bellman (HJB) type equation that characterizes the optimal solution. Using techniques from deep learning we approximately solve the resulting partial integro-differential equation. We present (i) an approach solving the decision problem offline by learning an approximation of the value function and (ii) an online algorithm which provides a solution in belief space using deep reinforcement learning. We show the applicability on a set of toy examples which pave the way for future methods providing solutions for high dimensional problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes