LGAINov 29, 2021

Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions

arXiv:2111.14629v126 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of generalization for offline RL agents, which is incremental as it builds on prior self-supervised learning methods by adapting them to the offline setting.

The paper tackled the problem of poor zero-shot generalization in offline reinforcement learning by proposing Generalized Similarity Functions (GSF), a framework that uses contrastive learning to improve observation similarity estimation based on expected future behavior, resulting in enhanced performance on the offline Procgen benchmark.

Reinforcement learning (RL) agents are widely used for solving complex sequential decision making tasks, but still exhibit difficulty in generalizing to scenarios not seen during training. While prior online approaches demonstrated that using additional signals beyond the reward function can lead to better generalization capabilities in RL agents, i.e. using self-supervised learning (SSL), they struggle in the offline RL setting, i.e. learning from a static dataset. We show that performance of online algorithms for generalization in RL can be hindered in the offline setting due to poor estimation of similarity between observations. We propose a new theoretically-motivated framework called Generalized Similarity Functions (GSF), which uses contrastive learning to train an offline RL agent to aggregate observations based on the similarity of their expected future behavior, where we quantify this similarity using \emph{generalized value functions}. We show that GSF is general enough to recover existing SSL objectives while also improving zero-shot generalization performance on a complex offline RL benchmark, offline Procgen.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes