AIFeb 27, 2019

Learning Factored Markov Decision Processes with Unawareness

arXiv:1902.10619v12.0

Originality Incremental advance

AI Analysis

This addresses the challenge of sequential decision-making in AI/robotics where agents may not know all possible states and actions upfront, representing an incremental advance over existing methods.

The paper tackles the problem of learning factored Markov Decision Processes when the agent is initially unaware of critical factors, by combining domain exploration and expert assistance to guarantee convergence to near-optimal behavior. Experiments show the agent learns optimal behavior on small and large problems, with faster convergence achieved by conserving information upon discovering new possibilities.

Methods for learning and planning in sequential decision problems often assume the learner is aware of all possible states and actions in advance. This assumption is sometimes untenable. In this paper, we give a method to learn factored markov decision problems from both domain exploration and expert assistance, which guarantees convergence to near-optimal behaviour, even when the agent begins unaware of factors critical to success. Our experiments show our agent learns optimal behaviour on small and large problems, and that conserving information on discovering new possibilities results in faster convergence.

View on arXiv PDF

Similar