ML AI LG RO SYMay 24, 2018

A0C: Alpha Zero in Continuous Action Space

Thomas M. Moerland, Joost Broekens, Aske Plaat, Catholijn M. Jonker

arXiv:1805.09613v115.353 citations

Originality Incremental advance

AI Analysis

This work addresses a key limitation for applying advanced reinforcement learning methods to real-world domains like robotics, though it is an incremental step focused on theoretical extensions and initial validation.

The paper tackled the problem of applying Alpha Zero's interleaved tree search and deep learning to continuous action spaces, such as in robotic control, by providing theoretical extensions and demonstrating feasibility with preliminary experiments on the Pendulum swing-up task.

A core novelty of Alpha Zero is the interleaving of tree search and deep learning, which has proven very successful in board games like Chess, Shogi and Go. These games have a discrete action space. However, many real-world reinforcement learning domains have continuous action spaces, for example in robotic control, navigation and self-driving cars. This paper presents the necessary theoretical extensions of Alpha Zero to deal with continuous action space. We also provide some preliminary experiments on the Pendulum swing-up task, empirically showing the feasibility of our approach. Thereby, this work provides a first step towards the application of iterated search and learning in domains with a continuous action space.

View on arXiv PDF

Similar