LGJun 18, 2012

Path Integral Policy Improvement with Covariance Matrix Adaptation

arXiv:1206.4621v1220 citations
Originality Incremental advance
AI Analysis

This work addresses continuous state and action problems in reinforcement learning, offering an incremental improvement by automating exploration noise tuning for policy optimization.

The paper tackles the problem of optimizing parameterized policies in continuous reinforcement learning by introducing PI2-CMA, a novel algorithm that automatically determines exploration noise magnitude, achieving performance comparable to existing methods like CMA-ES and Cross-Entropy Methods.

There has been a recent focus in reinforcement learning on addressing continuous state and action problems by optimizing parameterized policies. PI2 is a recent example of this approach. It combines a derivation from first principles of stochastic optimal control with tools from statistical estimation theory. In this paper, we consider PI2 as a member of the wider family of methods which share the concept of probability-weighted averaging to iteratively update parameters to optimize a cost function. We compare PI2 to other members of the same family - Cross-Entropy Methods and CMAES - at the conceptual level and in terms of performance. The comparison suggests the derivation of a novel algorithm which we call PI2-CMA for "Path Integral Policy Improvement with Covariance Matrix Adaptation". PI2-CMA's main advantage is that it determines the magnitude of the exploration noise automatically.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes