AILGSep 5, 2015

Reinforcement Learning with Parameterized Actions

arXiv:1509.01644v4256 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of handling complex action spaces in reinforcement learning for domains like robotics or games, though it appears incremental as it builds on existing MDP frameworks.

The paper tackles the problem of reinforcement learning in environments with parameterized actions, where agents must choose both discrete actions and continuous parameters, by introducing the Q-PAMDP algorithm, which is shown to converge to a local optimum and is compared to direct policy search in goal-scoring and Platform domains.

We introduce a model-free algorithm for learning in Markov decision processes with parameterized actions-discrete actions with continuous parameters. At each step the agent must select both which action to use and which parameters to use with that action. We introduce the Q-PAMDP algorithm for learning in these domains, show that it converges to a local optimum, and compare it to direct policy search in the goal-scoring and Platform domains.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes