SY LGMay 13, 2020

Adaptive Smoothing Path Integral Control

Dominik Thalmeier, Hilbert J. Kappen, Simone Totaro, Vicenç Gómez

arXiv:2005.06364v14.39 citations

Originality Incremental advance

AI Analysis

This addresses a bottleneck in reinforcement learning for policy optimization, offering improved efficiency, though it is incremental relative to existing methods.

The paper tackles the poor sample efficiency of the Path Integral Cross-Entropy (PICE) method in reinforcement learning by proposing ASPIC, which applies adaptive smoothing to the cost function, and shows that intermediate smoothing levels outperform both PICE and direct cost-optimization.

In Path Integral control problems a representation of an optimally controlled dynamical system can be formally computed and serve as a guidepost to learn a parametrized policy. The Path Integral Cross-Entropy (PICE) method tries to exploit this, but is hampered by poor sample efficiency. We propose a model-free algorithm called ASPIC (Adaptive Smoothing of Path Integral Control) that applies an inf-convolution to the cost function to speedup convergence of policy optimization. We identify PICE as the infinite smoothing limit of such technique and show that the sample efficiency problems that PICE suffers disappear for finite levels of smoothing. For zero smoothing this method becomes a greedy optimization of the cost, which is the standard approach in current reinforcement learning. We show analytically and empirically that intermediate levels of smoothing are optimal, which renders the new method superior to both PICE and direct cost-optimization.

View on arXiv PDF

Similar