LGOCNov 15, 2017

Costate-focused models for reinforcement learning

arXiv:1711.05817v5
Originality Highly original
AI Analysis

This work addresses time-optimal control problems in reinforcement learning, offering a novel approach that may be incremental compared to existing model-free methods.

The paper tackles the problem of time-optimal control in reinforcement learning by introducing a method based on the costate equation and state dynamics models, which focuses on relevant environmental features and shows strong performance compared to deep deterministic policy gradient on deterministic and stochastic mechanical systems.

Many recent algorithms for reinforcement learning are model-free and founded on the Bellman equation. Here we present a method founded on the costate equation and models of the state dynamics. We use the costate -- the gradient of cost with respect to state -- to improve the policy and also to "focus" the model, training it to detect and mimic those features of the environment that are most relevant to its task. We show that this method can handle difficult time-optimal control problems, driving deterministic or stochastic mechanical systems quickly to a target. On these tasks it works well compared to deep deterministic policy gradient, a recent Bellman method. And because it creates a model, the costate method can also learn from mental practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes