LGAINov 30, 2017

Learnings Options End-to-End for Continuous Action Tasks

arXiv:1712.00004v162 citations
Originality Synthesis-oriented
AI Analysis

This work addresses improving reinforcement learning efficiency for continuous control tasks, but it appears incremental as it builds on existing methods like option-critic and PPO.

The paper tackles learning temporally extended actions for continuous tasks using the options framework, achieving promising results on Mujoco domains but raising questions about option usage timing related to initiation sets.

We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup [2000]). In orderto achieve this goal we work with the option-critic architecture (Baconet al.[2017])using a deliberation cost and train it with proximal policy optimization (Schulmanet al.[2017]) instead of vanilla policy gradient. Results on Mujoco domains arepromising, but lead to interesting questions aboutwhena given option should beused, an issue directly connected to the use of initiation sets.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes