LGROJun 15, 2023

Simplified Temporal Consistency Reinforcement Learning

arXiv:2306.09466v122 citationsh-index: 45
Originality Incremental advance
AI Analysis

This addresses sample efficiency for RL practitioners, though it appears incremental as it builds on existing representation learning approaches.

The paper tackles the sample efficiency problem in reinforcement learning by showing that a simple latent dynamics model trained with temporal consistency alone can achieve high performance, achieving 4.1x faster training than ensemble methods in model-based RL and matching model-based sample efficiency while training 2.4x faster in model-free RL.

Reinforcement learning is able to solve complex sequential decision-making tasks but is currently limited by sample efficiency and required computation. To improve sample efficiency, recent work focuses on model-based RL which interleaves model learning with planning. Recent methods further utilize policy learning, value estimation, and, self-supervised learning as auxiliary objectives. In this paper we show that, surprisingly, a simple representation learning approach relying only on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL. This applies when using pure planning with a dynamics model conditioned on the representation, but, also when utilizing the representation as policy and value function features in model-free RL. In experiments, our approach learns an accurate dynamics model to solve challenging high-dimensional locomotion tasks with online planners while being 4.1 times faster to train compared to ensemble-based methods. With model-free RL without planning, especially on high-dimensional tasks, such as the DeepMind Control Suite Humanoid and Dog tasks, our approach outperforms model-free methods by a large margin and matches model-based methods' sample efficiency while training 2.4 times faster.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes