RO SY DSAug 23, 2021

A generalized stacked reinforcement learning method for sampled systems

Pavel Osinenko, Dmitrii Dobriborsci, Grigory Yaremenko, Georgiy Malaniya

arXiv:2108.10392v33.0

Originality Synthesis-oriented

AI Analysis

This work addresses a domain-specific challenge in robotics and control systems by providing incremental improvements for sampled reinforcement learning applications.

The paper tackled the problem of applying reinforcement learning to sampled systems, such as physical systems with time-continuous dynamics, by proposing two hybrid methods combining model-predictive control with critics for Q- and value functions, and benchmarked them in a mobile robot case study with performance comparisons.

A common setting of reinforcement learning (RL) is a Markov decision process (MDP) in which the environment is a stochastic discrete-time dynamical system. Whereas MDPs are suitable in such applications as video-games or puzzles, physical systems are time-continuous. A general variant of RL is of digital format, where updates of the value (or cost) and policy are performed at discrete moments in time. The agent-environment loop then amounts to a sampled system, whereby sample-and-hold is a specific case. In this paper, we propose and benchmark two RL methods suitable for sampled systems. Specifically, we hybridize model-predictive control (MPC) with critics learning the optimal Q- and value (or cost-to-go) function. Optimality is analyzed and performance comparison is done in an experimental case study with a mobile robot.

View on arXiv PDF

Similar