AI LGSep 5, 2015

Reinforcement Learning with Parameterized Actions

Warwick Masson, Pravesh Ranchod, George Konidaris

arXiv:1509.01644v433.3256 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of handling complex action spaces in reinforcement learning for domains like robotics or games, though it appears incremental as it builds on existing MDP frameworks.

The paper tackles the problem of reinforcement learning in environments with parameterized actions, where agents must choose both discrete actions and continuous parameters, by introducing the Q-PAMDP algorithm, which is shown to converge to a local optimum and is compared to direct policy search in goal-scoring and Platform domains.

We introduce a model-free algorithm for learning in Markov decision processes with parameterized actions-discrete actions with continuous parameters. At each step the agent must select both which action to use and which parameters to use with that action. We introduce the Q-PAMDP algorithm for learning in these domains, show that it converges to a local optimum, and compare it to direct policy search in the goal-scoring and Platform domains.

View on arXiv PDF

Similar