LGRODec 19, 2023

Value Explicit Pretraining for Learning Transferable Representations

arXiv:2312.12339v2h-index: 10
Originality Incremental advance
AI Analysis

This addresses the challenge of generalizing to unseen tasks in reinforcement learning, particularly for domains like navigation and gaming, with incremental improvements over existing methods.

The paper tackles the problem of learning transferable representations for reinforcement learning by proposing Value Explicit Pretraining (VEP), which uses a self-supervised contrastive loss to learn objective-conditioned representations, resulting in up to 2 times improvement in rewards and up to 3 times improvement in sample efficiency on Atari and visual navigation benchmarks.

We propose Value Explicit Pretraining (VEP), a method that learns generalizable representations for transfer reinforcement learning. VEP enables learning of new tasks that share similar objectives as previously learned tasks, by learning an encoder for objective-conditioned representations, irrespective of appearance changes and environment dynamics. To pre-train the encoder from a sequence of observations, we use a self-supervised contrastive loss that results in learning temporally smooth representations. VEP learns to relate states across different tasks based on the Bellman return estimate that is reflective of task progress. Experiments using a realistic navigation simulator and Atari benchmark show that the pretrained encoder produced by our method outperforms current SoTA pretraining methods on the ability to generalize to unseen tasks. VEP achieves up to a 2 times improvement in rewards on Atari and visual navigation, and up to a 3 times improvement in sample efficiency. For videos of policy performance visit our https://sites.google.com/view/value-explicit-pretraining/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes