LGAIMLJun 4, 2023

ContraBAR: Contrastive Bayes-Adaptive Deep RL

arXiv:2306.02418v111 citationsh-index: 39
Originality Incremental advance
AI Analysis

This addresses the computational challenges in meta RL for tasks with image observations, though it is incremental as it builds on existing contrastive learning approaches.

The paper tackled the problem of learning Bayes-optimal policies in meta reinforcement learning by proposing ContraBAR, a method that uses contrastive predictive coding instead of variational inference, achieving comparable performance to state-of-the-art methods in state-based domains and enabling learning in image-based domains.

In meta reinforcement learning (meta RL), an agent seeks a Bayes-optimal policy -- the optimal policy when facing an unknown task that is sampled from some known task distribution. Previous approaches tackled this problem by inferring a belief over task parameters, using variational inference methods. Motivated by recent successes of contrastive learning approaches in RL, such as contrastive predictive coding (CPC), we investigate whether contrastive methods can be used for learning Bayes-optimal behavior. We begin by proving that representations learned by CPC are indeed sufficient for Bayes optimality. Based on this observation, we propose a simple meta RL algorithm that uses CPC in lieu of variational belief inference. Our method, ContraBAR, achieves comparable performance to state-of-the-art in domains with state-based observation and circumvents the computational toll of future observation reconstruction, enabling learning in domains with image-based observations. It can also be combined with image augmentations for domain randomization and used seamlessly in both online and offline meta RL settings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes