Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space
This work addresses a domain-specific challenge in reinforcement learning for tasks with parameterized actions, representing an incremental improvement over existing methods.
The paper tackles the problem of sample-efficient end-to-end training in deep reinforcement learning for parameterized action spaces by proposing a new compact architecture and two training methods based on TRPO and SVG. The result is that these methods outperform the state-of-the-art Parameterized Action DDPG on test domains.
We explore Deep Reinforcement Learning in a parameterized action space. Specifically, we investigate how to achieve sample-efficient end-to-end training in these tasks. We propose a new compact architecture for the tasks where the parameter policy is conditioned on the output of the discrete action policy. We also propose two new methods based on the state-of-the-art algorithms Trust Region Policy Optimization (TRPO) and Stochastic Value Gradient (SVG) to train such an architecture. We demonstrate that these methods outperform the state of the art method, Parameterized Action DDPG, on test domains.