AIJul 27, 2014

MDPs with Unawareness

Joseph Y. Halpern, Nan Rong, Ashutosh Saxena

arXiv:1407.7191v19 citations

Originality Incremental advance

AI Analysis

This addresses a limitation in robotics, control, and economics by modeling scenarios with incomplete awareness, though it appears incremental as it extends traditional MDPs.

The paper tackles the problem of decision-making in Markov decision processes where the decision maker is unaware of all possible actions, introducing MDPs with unawareness (MDPUs) and providing a characterization and algorithm for learning near-optimal solutions, including polynomial-time conditions.

Markov decision processes (MDPs) are widely used for modeling decision-making problems in robotics, automated control, and economics. Traditional MDPs assume that the decision maker (DM) knows all states and actions. However, this may not be true in many situations of interest. We define a new framework, MDPs with unawareness (MDPUs) to deal with the possibilities that a DM may not be aware of all possible actions. We provide a complete characterization of when a DM can learn to play near-optimally in an MDPU, and give an algorithm that learns to play near-optimally when it is possible to do so, as efficiently as possible. In particular, we characterize when a near-optimal solution can be found in polynomial time.

View on arXiv PDF

Similar