LGNEOct 30, 2015

Learning Continuous Control Policies by Stochastic Value Gradients

arXiv:1510.09142v1598 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of stochastic control in continuous domains for robotics and simulation applications, presenting an incremental improvement over existing methods.

The authors tackled the problem of learning continuous control policies by introducing a unified framework that uses backpropagation and treats stochasticity as deterministic noise, resulting in a spectrum of policy gradient algorithms. They demonstrated effectiveness in simulation, with one variant, SVG(1), showing promise in learning models, value functions, and policies simultaneously.

We present a unified framework for learning continuous control policies using backpropagation. It supports stochastic control by treating stochasticity in the Bellman equation as a deterministic function of exogenous noise. The product is a spectrum of general policy gradient algorithms that range from model-free methods with value functions to model-based methods without value functions. We use learned models but only require observations from the environment in- stead of observations from model-predicted trajectories, minimizing the impact of compounded model errors. We apply these algorithms first to a toy stochastic control problem and then to several physics-based control problems in simulation. One of these variants, SVG(1), shows the effectiveness of learning models, value functions, and policies simultaneously in continuous domains.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes